This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang-tools-extra/clangd/
-
clangd/
-
index/
1/6
Serialization.cpp
-
unittests/
-
SerializationTests.cpp
-
clang/lib/
-
lib/
-
Driver/ToolChains/
-
ToolChains/
-
Clang.cpp
-
Serialization/
1/3
ASTReader.cpp
4/5
ASTWriter.cpp
-
lld/ELF/
-
ELF/
-
Driver.cpp
1
InputSection.cpp
-
llvm/
-
include/llvm/
-
llvm/
-
Object/
-
Decompressor.h
-
ProfileData/
-
InstrProf.h
-
Support/
1/10
Compression.h
-
lib/
-
MC/
-
ELFObjectWriter.cpp
-
ObjCopy/ELF/
-
ELF/
2
ELFObject.cpp
-
Object/
1
Decompressor.cpp
-
ProfileData/
-
Coverage/
1
CoverageMappingReader.cpp
6/10
CoverageMappingWriter.cpp
2
InstrProf.cpp
-
InstrProfCorrelator.cpp
1
SampleProfReader.cpp
-
SampleProfWriter.cpp
-
Support/
1/13
Compression.cpp
-
tools/
-
llvm-mc/
-
llvm-mc.cpp
-
llvm-objcopy/
-
ObjcopyOptions.cpp
-
unittests/
-
ProfileData/
-
InstrProfTest.cpp
-
Support/
-
CompressionTest.cpp

Differential D130516

[llvm] compression classes
Needs ReviewPublic

Authored by ckissane on Jul 25 2022, 2:26 PM.

Download Raw Diff

Details

Reviewers

dblaikie
alexander-shaposhnikov
rupprecht
jhenderson
MaskRay
leonardchan

Summary

Adds a CompressionKind "enum" with values:
CompressionKind::Unknown ≈ 255
CompressionKind::Zlib ≈ 1
CompressionKind::ZStd ≈ 2

also note: OptionalCompressionKind is typedef'ed as Optional<CompressionKind>, and NoneType() is used to indicate no compression.

The CompressionKind "enum" has overrides for several operators:

CompressionKind::operator uint8_t() const
(Self-explanatory)

CompressionKind::operator bool() const
(true if supported, false otherwise)

bool operator==(CompressionKind a, CompressionKind b)

(Self-explanatory)

none compression is represented simply as a none type for use cases that will use Optional<CompressionKind>.
once you have a CompressionKind itself you can pass it around as a value (because a CompressionKind is just a struct containing one uint8_t (a fake "enum")).
due to my operator overloading, you can do stuff like this:

llvm::compression::OptionalCompressionKind OptionalCompressionScheme =
          llvm::compression::getOptionalCompressionKind(CompressionSchemeId);
      if (!OptionalCompressionScheme) {
        return llvm::MemoryBuffer::getMemBuffer(Blob, Name, true);
      }
      llvm::compression::CompressionKind CompressionScheme =
          *OptionalCompressionScheme;
      if (!CompressionScheme) {
        Error("compression class " +
              (CompressionScheme->Name + " is not available").str());
        return nullptr;
      }
      SmallVector<uint8_t, 0> Uncompressed;
      if (llvm::Error E = CompressionScheme->decompress(
              llvm::arrayRefFromStringRef(Blob), Uncompressed, Record[0])) {
        Error("could not decompress embedded file contents: " +
              llvm::toString(std::move(E)));
        return nullptr;
      }
      return llvm::MemoryBuffer::getMemBufferCopy(
          llvm::toStringRef(Uncompressed), Name);

(excerpt from ASTReader.cpp)

you can even do calls like

llvm::compression::CompressionKind::Zlib->decompress(...)
(this can be useful in cases like elf decompression where you might switch on debug section type and in a switch case for DebugCompressionType::Z)

I also believe similar semantics to the nullptr suggestion have been achieved, due to the bool cast returning supported status, and the Unknown type acting as a nullptr of sorts, being always unsupported and -1 (255) as a uint8. Optional(Unknown) is returned from getOptionalCompressionKind(uint8_t) when it is not 0 (NoneType()), 1 (Optional(Zlib)), or 2 (Optional(ZStd)).

In places where explict compression must be used CompressionKind can be passed around, and possibly optional (sometimes none) compression is represented as llvm::Optional<CompressionKind> which I have type aliased as OptionalCompressionKind

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ckissane created this revision.Jul 25 2022, 2:26 PM

Herald added a reviewer: alexander-shaposhnikov. · View Herald TranscriptJul 25 2022, 2:26 PM

Herald added a reviewer: rupprecht. · View Herald Transcript

Herald added a reviewer: jhenderson. · View Herald Transcript

Herald added a reviewer: MaskRay. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: wenlei, usaxena95, kadircet and 4 others. · View Herald Transcript

ckissane requested review of this revision.Jul 25 2022, 2:26 PM

Herald added projects: Restricted Project, Restricted Project, Restricted Project. · View Herald TranscriptJul 25 2022, 2:26 PM

Herald added subscribers: cfe-commits, llvm-commits, StephenFan. · View Herald Transcript

format
merge fix

ckissane edited the summary of this revision. (Show Details)Jul 25 2022, 2:31 PM

ckissane mentioned this in D130506: [Support] Add llvm::compression::{getReasonIfUnsupported,compress,decompress}.

dblaikie added inline comments.Jul 25 2022, 2:33 PM

clang/lib/Serialization/ASTWriter.cpp
2003–2004	Doesn't this cause slicing & end up with the base implementation? (also the base class `CompressionAlgorithm` has no virtual functions, so I'm not sure how this is meant to work - does this code all work? Then I must be missing some things - how does this work?)

ckissane added a reviewer: leonardchan.Jul 25 2022, 2:35 PM

Thanks for experimenting the refactoring. My gut feeling is that

for inheritance llvm/lib/Support/Compression.cpp introduces quite a bit of complexity.
BestSpeedCompression/DefaultCompression/BestSizeCompression may be kinda weird. Not all algorithms may need all of three.
this new interface does not make parallel compression/decompression easier.

I mentioned this on https://groups.google.com/g/generic-abi/c/satyPkuMisk/m/xRqMj8M3AwAJ

I know the paradox of choice:) Certainly that we should not add a plethora of bzip2, xz, lzo, brotli, etc to generic-abi. I don't think users are so fond of using a different format for a slight different read/write/compression ratio/memory usage need. They want to pick one format which performs well across a wide variety of workloads. (Adding a new format also introduces complexity on their build system side. They need to teach DWARF consumers they use. It's not a small undertaking.)

I think for a long time llvm/lib/Support/Compression.cpp will stay with just zlib and zstd. There is a question whether we want to make the heavier abstraction now. There is a possibility if a llvm-project use case needs an alternative (it needs very strong arguments not using zstd), it has the other approach that not using llvm/Support/Compression.h

ckissane added inline comments.Jul 25 2022, 3:05 PM

clang/lib/Serialization/ASTWriter.cpp
2003–2004	You are correct to observe that this patch does not fully pass around pointers to instances of the classes, however, because I don't pass pointers and the currently repetitive nature of the compression classes, this still functions correctly. In short, a follow-up patch (which I will shortly upload) will convert this to using class instances and passing those around. Including reworking functions throughout llvm-project to take advantage of this. I am aiming to take this 2 step process to cut down on making an already large pass larger. Let me know if you have any concerns or ideas.

dblaikie added inline comments.Jul 25 2022, 3:25 PM

clang/lib/Serialization/ASTWriter.cpp
2003–2004	But I'm not sure how this patch works correctly - wouldn't the line below (`CompressionScheme.supported()`) call `CompressionAlgorithm::supported()` which always returns false?

Harbormaster completed remote builds in B177466: Diff 447469.Jul 25 2022, 3:45 PM

ckissane added inline comments.Jul 26 2022, 2:12 PM

clang/lib/Serialization/ASTWriter.cpp
2003–2004	good catch

feat: compression class + clang ast serial zstd option + zstd elf + ztsd objcopy support
[MC] fix merge issues with removal of GNU compression
fix compression class inheritence and passing
update usages of -enable-name-compression=false to be -name-compression=none
do not exit on parse of bad compression cl::opt

Herald added a project: Restricted Project. · View Herald TranscriptJul 26 2022, 3:57 PM

Herald added subscribers: Restricted Project, Enna1, mgorny. · View Herald Transcript

ckissane retitled this revision from [Support] compression classes to compression classes.Jul 26 2022, 4:02 PM

Harbormaster completed remote builds in B177736: Diff 447863.Jul 26 2022, 4:59 PM

fix compression class usage in some profile data tests

dblaikie mentioned this in D130458: [llvm-objcopy] Support --{,de}compress-debug-sections for zstd.Jul 27 2022, 1:18 PM

Any chance this could be split up to be more directly comparable to https://reviews.llvm.org/D130458 ?

clang-tools-extra/clangd/index/Serialization.cpp
35	We're generally trying to avoid global ctors in LLVM. So at most this should be a static local variable in a function that accesses the algorithm (though perhaps these "compression algorithm" classes shouldn't be accessible directly, and only through singleton accessors in the defined alongside - like there's no reason for LLVM to contain more than one instance of ZlibCompressionAlgorithm, I think?)
llvm/include/llvm/Support/Compression.h
72	Rather than `supported()` maybe the accessor functions could return nullptr when support isn't available? if (CompressionAlgorithm *A = getZstdCompressionScheme()) etc. Though I guess that doesn't allow for a default implementation - I guess an alternative function could be `CompressionAlgorithm& getCompressionSchemeOrNone(Zstd)` which always gives a valid `CompressionAlgorithm` by giving the do-nothing compression algorithm when the specified one is not available. But I guess we don't generally want to silently fallback to null compression, because the streams we're producing always need to know if they have to emit headers, etc, or not? So maybe there's no need for a default?

tiny cleanup of using NoneCompressionAlgorithm::AlgorithmId

In D130516#3683384, @dblaikie wrote:

Any chance this could be split up to be more directly comparable to https://reviews.llvm.org/D130458 ?

yes definitely! Doing so now!

Harbormaster completed remote builds in B177937: Diff 448146.Jul 27 2022, 2:45 PM

fix compression inheritence and some more compression class helpers

ckissane edited the summary of this revision. (Show Details)Jul 27 2022, 2:59 PM

ckissane added inline comments.

llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp
55	note the helpers such as when(bool), whensupported() and notNone()

use CompressionScheme->notNone() in InstrProf

ckissane added inline comments.Jul 27 2022, 3:10 PM

clang-tools-extra/clangd/index/Serialization.cpp
35	your idea seems correct to me, however some algorithms, such as zstd, support running in multiple threads, so this might influence our decision.

dblaikie added inline comments.Jul 27 2022, 3:16 PM

clang-tools-extra/clangd/index/Serialization.cpp
35	So long as the compression algorithm objects are stateless, this would be acceptable - such objects would be thread safe for multiple concurrent users.

ckissane added a child revision: D130667: feat: use compression class for: clang ast serial zstd option + prof data compression variants + zstd elf + ztsd objcopy support.Jul 27 2022, 3:27 PM

ckissane retitled this revision from compression classes to [llvm] compression classes.Jul 27 2022, 3:31 PM

marked outdated comments as done

dblaikie added inline comments.Jul 27 2022, 3:43 PM

llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp
51–52	This seems a bit too convoluted for me. I'd think something like: if (DoInstrProfNameCompression) { if (CompressionAlgorithm *C = getZlibCompressionAlgorithm()) C->compress(...); } Or even have `getCompressionAlgorithm(SupportCompressionType::Zlib)` (like that could be the only entry point - no need for algorithm-specific accessors, that function would have one switch over `SupportCompressionType`, returning null if Unknown or Null were passed, or if the requested algorithm was not available) I'm not sure I understand the 'when'/'whenSupported' stuff and whether there's any value/need for more details to be communicated in the not-available case other than 'false'/null/nothing (like, if it needs to communicate a reason for non-availability, that's more involved than returning null from some factory/accessor function).
llvm/lib/Support/Compression.cpp
98–102	Maybe these don't need to be static members - if there are singleton insntances of the algorithms, they could be members of those singletons instead (possibly in the base/impl class - the derived classes could pass these values into the base ctor to initialize members in the impl or base - they could even be const public members, avoid the need for accessors (at least avoiding the need for virtual accessors, but hopefully avoiding accessors entirely))

Harbormaster completed remote builds in B177965: Diff 448181.Jul 27 2022, 4:33 PM

leonardchan added inline comments.Jul 28 2022, 11:40 AM

clang-tools-extra/clangd/index/Serialization.cpp
196–197	Will this leak?
llvm/include/llvm/Support/Compression.h
57–59	Does the `uncompress` version of this just end up calling into the other `uncompress` function? If so, we could probably just have one `decompress` virtual method here and the one that accepts a `SmallVectorImpl` just calls into the virtual `decompress` rather than have two virtual methods that would do the same thing. It looks like you've done that in `CompressionAlgorithmImpl`, but I think it could be moved here.
61–62	Perhaps add some comments for these functions? At least for me, it's not entirely clear what these are for.
67–68	Perhaps it would be simpler to just have the individual subclasses inherit from `CompressionAlgorithm` rather than have them all go through `CompressionAlgorithmImpl`? It looks like each child class with methods like `getAlgorithmId` can just return the static values themselves rather than passing them up to a parent to be returned. I think unless some static polymorphism is needed here, CRTP might not be needed here.
llvm/lib/Support/Compression.cpp
68	Does `NoneCompressionAlgorithm` imply there's no compression at all? If so, I would think these methods should be empty.
170–171	Is the purpose of `UnknownCompressionAlgorithm` to be the default instance here? If so, would it be better perhaps to just omit this and have an `llvm_unreachable` in the `default` case below? I would assume users of this function should just have the right compression scheme ID they need and any error checking on if something is a valid ID would be done before calling this.

make compression singletons

fix usage of CompressionAlgorithmFromId

Harbormaster completed remote builds in B178174: Diff 448473.Jul 28 2022, 5:36 PM

I'd like to make a few arguments for the current namespace+free function design, as opposed to the class+member function design as explored in this patch (but thanks for the exploration!).
Let's discuss several use cases.

(a) if a use case just calls compress/uncompress. The class design has slightly more boilerplate as it needs to get the algorithm class, a new instance, or a singleton instance.
For each new use, the number of lines may not differ, but the involvement of a a static class member or an instance make the reader wonder whether the object will be reused or thrown away.
There is some slight cognitive burden.
The class design has a non-trivial one-shot cost to have a function returning the singleton instance.

(b) zlib compress/uncompress immediately following an availability check.

// free function
if (!compression::zlib::isAvailable())
  errs() << "cannot compress: " << compression::zlib::buildConfigurationHint();

// class
auto *algo = !compression::ZlibCompression;
if (!algo->isAvailable()) {
  errs() << "cannot compress: " << algo->buildConfigurationHint();
}

// free function
if (!compression::isAvailable(format))
  errs() << "cannot compress: " << compression::buildConfigurationHint(format);

// class
std::unique_ptr<Compression> algo = make_compression(format);
if (!algo->isAvailable()) {
  errs() << "cannot compress: " << algo->buildConfigurationHint();
}

(d) compress/uncompress and an availability check are apart.

// free function
no change

// class
Store (the pointer to the) the algorithm object somewhere, or construct the pointer/object twice.

leonardchan added inline comments.Jul 29 2022, 12:03 PM

llvm/lib/Support/Compression.cpp
30–32	Perhaps for each of these, you could instead have something like: ZStdCompressionAlgorithm getZStdCompressionAlgorithm() { static ZStdCompressionAlgorithm instance = new ZStdCompressionAlgorithm; return instance; } This way the instances are only new'd when they're actually used.

In D130516#3688123, @MaskRay wrote:

I'd like to make a few arguments for the current namespace+free function design, as opposed to the class+member function design as explored in this patch (but thanks for the exploration!).
Let's discuss several use cases.

(a) if a use case just calls compress/uncompress. The class design has slightly more boilerplate as it needs to get the algorithm class, a new instance, or a singleton instance.
For each new use, the number of lines may not differ, but the involvement of a a static class member or an instance make the reader wonder whether the object will be reused or thrown away.
There is some slight cognitive burden.
The class design has a non-trivial one-shot cost to have a function returning the singleton instance.

Though there must've been a condition that dominates this use somewhere - I'd suggest that condition could be where the algorithm is retrieved, and then passed to this code to use unconditionally.

If the algorithm object is const and raw pointers/references are used, I think it makes it clear to the reader that there's no ownership here, and it's not stateful when compressing/decompressing.

(b) zlib compress/uncompress immediately following an availability check.

// free function
if (!compression::zlib::isAvailable())
  errs() << "cannot compress: " << compression::zlib::buildConfigurationHint();

// class
auto *algo = !compression::ZlibCompression;
if (!algo->isAvailable()) {
  errs() << "cannot compress: " << algo->buildConfigurationHint();
}

I think maybe this code might end up looking like:

Algo *algo = getAlgo(Zlib)
if (!algo)
  errs() ...

It's possible that this function would return non-null even for a non-available algorithm if we wanted to communicate other things (like the cmake macro name to enable to add the functionality)

// free function
if (!compression::isAvailable(format))
  errs() << "cannot compress: " << compression::buildConfigurationHint(format);

// class
std::unique_ptr<Compression> algo = make_compression(format);
if (!algo->isAvailable()) {
  errs() << "cannot compress: " << algo->buildConfigurationHint();
}

I don't think there's a need for unique_ptr here - algorithms can be constant singletons, referenced via raw const pointers/references without ownership.

& this example doesn't include the code that does the compression/decompression, which seems part of the discussion & part I find nice in that the type of compression used matches the type used in the check necessarily rather than being passed into two APIs independently.

(d) compress/uncompress and an availability check are apart.
// free function
no change

// class
Store (the pointer to the) the algorithm object somewhere, or construct the pointer/object twice.

The benefit here is that it's harder for the test to become separated from the usage - for the usage to end up becoming unconditional/incorrectly guarded.

llvm/lib/Support/Compression.cpp
30–32	Yep, I'd mentioned/suggested that (so, seconding here) elsewhere encouraging these to be singletons: https://reviews.llvm.org/D130516#3683384 And they don't even need to be 'new'd in that case, this would be fine: ZstdCompressionAlgorithm &getZstdCompressionAlgorithm() { static ZstdCompressionAlgorithm C; return C; } Though I think maybe we don't need individual access to the algorithms, and it'd be fine to have only a single entry point like this: CompressionAlgorithm *getCompressionAlgorithm(DebugCompressionType T) { switch (T) { case DebugCompressionType::ZStd: { static zstd::CompressionAlgorithm Zstd; if (zstd::isAvailable()) return &Zstd; } ... } return nullptr; } (or, possibly, we want to return non-null even if it isn't available, if we include other things (like the configure macro name - so callers can use that name to print helpful error messages - but then they have to explicitly check if the algorithm is available after the call))
98–102	I don't think there's particular value in these being constexpr members - and maybe we don't need these at all just yet/could leave them out for now? It'd be great to reduce this whole patch to something more comparable with https://reviews.llvm.org/D130458 If you have plans for these other properties it might be helpful to understand what they are - they might help inform the design discussion. (if we are keeping tnhese properties, including the string version of the name, etc - I'd think the way to do it would be for the base algorithm class to have non-static members to store these, and derived algorithm classes to pass the values into the base ctor to be stored in the members - they could even be const public members of the algorithm to be accessed directly, rather than via accessor functions (& certainly not virtual accessor functions))

MaskRay mentioned this in rGce6dd4e835a3: Revert D130458 "[llvm-objcopy] Support --{,de}compress-debug-sections for zstd".Jul 29 2022, 3:47 PM

ckissane added inline comments.Aug 1 2022, 10:27 AM

llvm/lib/Support/Compression.cpp
30–32	they currently already have singleton behavior i.e. `llvm::compression::ZStdCompressionAlgorithm::Instance` is the only place `new ZStdCompressionAlgorithm()` can be put into because the constructor is protected. I'd rather not achieve "This way the instances are only new'd when they're actually used." Because the rewards of that are relatively small, but it will make the code more verbose, I think the current pattern allows the best of both worlds of the namespace approach: (`llvm::compression::zlib::compress` becomes `llvm::compression::ZlibCompression->compress`) but they can be passed as class instances.

(still lots of outstanding comments from my last round, so I won't reiterate those - but waiting for some responses to them)

llvm/lib/Support/Compression.cpp
30–32	Global constructors are to be avoided in LLVM: https://llvm.org/docs/CodingStandards.html#do-not-use-static-constructors (also these objects don't need to be dynamically allocated with `new` - they can be directly allocated (as static locals though, not as globals))

ckissane added a subscriber: phosek.Aug 1 2022, 2:33 PM

feat compression "enum" with methods

In D130516#3688236, @dblaikie wrote:

In D130516#3688123, @MaskRay wrote:

I'd like to make a few arguments for the current namespace+free function design, as opposed to the class+member function design as explored in this patch (but thanks for the exploration!).
Let's discuss several use cases.

(a) if a use case just calls compress/uncompress. The class design has slightly more boilerplate as it needs to get the algorithm class, a new instance, or a singleton instance.
For each new use, the number of lines may not differ, but the involvement of a a static class member or an instance make the reader wonder whether the object will be reused or thrown away.
There is some slight cognitive burden.
The class design has a non-trivial one-shot cost to have a function returning the singleton instance.

Though there must've been a condition that dominates this use somewhere - I'd suggest that condition could be where the algorithm is retrieved, and then passed to this code to use unconditionally.

If the algorithm object is const and raw pointers/references are used, I think it makes it clear to the reader that there's no ownership here, and it's not stateful when compressing/decompressing.

A pointer to a singleton compression class is isomorphic to an enum class CompressionType variable.
Using an enum variable doesn't lose any usage pattern we can do with a pointer to a singleton compression class.
An enum variable allows more patterns, as the allowed values are enumerable (we don't need to worry about -Wswitch for the uses).

Say, we do

auto *algo = !compression::ZlibCompression;
if (!algo)
  ...


algo->compress(...);

either together or apart, the result is similar to the following but with (IMO) slightly larger cognitive burden:

if (!compression::isAvailable(format))
  ...

compression::compress(format);

(b) zlib compress/uncompress immediately following an availability check.
// free function
if (!compression::zlib::isAvailable())
  errs() << "cannot compress: " << compression::zlib::buildConfigurationHint();

// class
auto *algo = !compression::ZlibCompression;
if (!algo->isAvailable()) {
  errs() << "cannot compress: " << algo->buildConfigurationHint();
}
I think maybe this code might end up looking like:
Algo *algo = getAlgo(Zlib)
if (!algo)
  errs() ...
It's possible that this function would return non-null even for a non-available algorithm if we wanted to communicate other things (like the cmake macro name to enable to add the functionality)

I think this is similarly achieved with an enum variable.
With the class based approach, a pointer has a static type of the ancestor compression class and a dynamic type of any possible algorithm.
This is not different from that: the enum variable may have a value the enum class supports.

(c) zlib/zstd compress/uncompress immediately following an availability check.
// free function
if (!compression::isAvailable(format))
  errs() << "cannot compress: " << compression::buildConfigurationHint(format);

// class
std::unique_ptr<Compression> algo = make_compression(format);
if (!algo->isAvailable()) {
  errs() << "cannot compress: " << algo->buildConfigurationHint();
}
I don't think there's a need for unique_ptr here - algorithms can be constant singletons, referenced via raw const pointers/references without ownership.

& this example doesn't include the code that does the compression/decompression, which seems part of the discussion & part I find nice in that the type of compression used matches the type used in the check necessarily rather than being passed into two APIs independently.

Thanks for clarification. Then this fits my "singleton compression classes are isomorphic to an enum CompressionType variable" argument :)

(d) compress/uncompress and an availability check are apart.
// free function
no change

// class
Store (the pointer to the) the algorithm object somewhere, or construct the pointer/object twice.
The benefit here is that it's harder for the test to become separated from the usage - for the usage to end up becoming unconditional/incorrectly guarded.

make a zlib corruption check specific

Harbormaster completed remote builds in B178804: Diff 449355.Aug 2 2022, 11:42 AM

ckissane updated this revision to Diff 449365.Aug 2 2022, 11:42 AM

trim down compression api: remove supported()

Harbormaster completed remote builds in B178809: Diff 449365.Aug 2 2022, 11:54 AM

ckissane edited the summary of this revision. (Show Details)Aug 2 2022, 12:01 PM

ckissane edited the summary of this revision. (Show Details)Aug 2 2022, 12:09 PM

In D130516#3694151, @MaskRay wrote:

In D130516#3688236, @dblaikie wrote:

In D130516#3688123, @MaskRay wrote:

I'd like to make a few arguments for the current namespace+free function design, as opposed to the class+member function design as explored in this patch (but thanks for the exploration!).
Let's discuss several use cases.

(a) if a use case just calls compress/uncompress. The class design has slightly more boilerplate as it needs to get the algorithm class, a new instance, or a singleton instance.
For each new use, the number of lines may not differ, but the involvement of a a static class member or an instance make the reader wonder whether the object will be reused or thrown away.
There is some slight cognitive burden.
The class design has a non-trivial one-shot cost to have a function returning the singleton instance.

Though there must've been a condition that dominates this use somewhere - I'd suggest that condition could be where the algorithm is retrieved, and then passed to this code to use unconditionally.

If the algorithm object is const and raw pointers/references are used, I think it makes it clear to the reader that there's no ownership here, and it's not stateful when compressing/decompressing.

A pointer to a singleton compression class is isomorphic to an enum class CompressionType variable.

I don't mean to suggest that either design is fundamentally more or less functional - I'm totally OK with/agree that both design directions allow the implementation of all the desired final/end-user-visible functionality.

I'm trying to make a point about which, I think, achieves that goal in a "better" way - that's the space of design discussions, I think - what kinds of (developer, maintenance, etc) costs different designs incur.

Using an enum variable doesn't lose any usage pattern we can do with a pointer to a singleton compression class.

I agree that either design doesn't change what's possible - I do, though, think that the "usage patterns" are meaningfully different between the two designs.

An enum variable allows more patterns, as the allowed values are enumerable (we don't need to worry about -Wswitch for the uses).

Say, we do
auto *algo = !compression::ZlibCompression;
if (!algo)
  ...


algo->compress(...);
either together or apart, the result is similar to the following but with (IMO) slightly larger cognitive burden:
if (!compression::isAvailable(format))
  ...

compression::compress(format);

Specifically two APIs that are related (it's important/necessary to check for availability before calling compress or decompress) in their contracts but unrelated in their API use makes it easier to misuse the APIs and have a situation where the availability check doesn't cover the usage. That's what I think is important/important to discuss here.

(b) zlib compress/uncompress immediately following an availability check.
// free function
if (!compression::zlib::isAvailable())
  errs() << "cannot compress: " << compression::zlib::buildConfigurationHint();

// class
auto *algo = !compression::ZlibCompression;
if (!algo->isAvailable()) {
  errs() << "cannot compress: " << algo->buildConfigurationHint();
}
I think maybe this code might end up looking like:
Algo *algo = getAlgo(Zlib)
if (!algo)
  errs() ...
It's possible that this function would return non-null even for a non-available algorithm if we wanted to communicate other things (like the cmake macro name to enable to add the functionality)
I think this is similarly achieved with an enum variable.
With the class based approach, a pointer has a static type of the ancestor compression class and a dynamic type of any possible algorithm.
This is not different from that: the enum variable may have a value the enum class supports.

I agree that the code is similar in either case, but with a small difference that is important to me - that accessing the algorithm necessarily (to some degree - you could still have code that doesn't test the condition/dereferences null, the same way that code can dereference an empty Optional without checking first - but at least the API I'm suggesting makes clear there's a connection between availability and usage).

(c) zlib/zstd compress/uncompress immediately following an availability check.
// free function
if (!compression::isAvailable(format))
  errs() << "cannot compress: " << compression::buildConfigurationHint(format);

// class
std::unique_ptr<Compression> algo = make_compression(format);
if (!algo->isAvailable()) {
  errs() << "cannot compress: " << algo->buildConfigurationHint();
}
I don't think there's a need for unique_ptr here - algorithms can be constant singletons, referenced via raw const pointers/references without ownership.

& this example doesn't include the code that does the compression/decompression, which seems part of the discussion & part I find nice in that the type of compression used matches the type used in the check necessarily rather than being passed into two APIs independently.
Thanks for clarification. Then this fits my "singleton compression classes are isomorphic to an enum CompressionType variable" argument :)

I don't understand what you're saying here. Could you rephrase/expand a bit?

CompressionKind: clean up param names to == op

@dblaikie, @MaskRay I think I have worked out something that is the best of both worlds:
none compression is represented simply as a none type for use cases that will use Optional<CompressionKind>.
once you have a CompressionKind itself you can pass it around as a value (because a CompressionKind is just a struct containing one uint8_t (a fake "enum")).
due to my operator overloading, you can do stuff like this:

llvm::compression::OptionalCompressionKind OptionalCompressionScheme =
          llvm::compression::getOptionalCompressionKind(CompressionSchemeId);
      if (!OptionalCompressionScheme) {
        return llvm::MemoryBuffer::getMemBuffer(Blob, Name, true);
      }
      llvm::compression::CompressionKind CompressionScheme =
          *OptionalCompressionScheme;
      if (!CompressionScheme) {
        Error("compression class " +
              (CompressionScheme->getName() + " is not available").str());
        return nullptr;
      }
      SmallVector<uint8_t, 0> Uncompressed;
      if (llvm::Error E = CompressionScheme->decompress(
              llvm::arrayRefFromStringRef(Blob), Uncompressed, Record[0])) {
        Error("could not decompress embedded file contents: " +
              llvm::toString(std::move(E)));
        return nullptr;
      }
      return llvm::MemoryBuffer::getMemBufferCopy(
          llvm::toStringRef(Uncompressed), Name);

(excerpt from ASTReader.cpp)

you can even do calls like

ckissane edited the summary of this revision. (Show Details)Aug 2 2022, 12:18 PM

ckissane edited the summary of this revision. (Show Details)Aug 2 2022, 12:25 PM

The current code here still seems more complicated than I'd prefer - looks like currently the size/speed/default levels are currently unused, so maybe we can omit those for now, knowing they will be added?
And the CompressionKind with all its operator overloads seems like a lot of surface area that is pretty non-obvious for usage - boolean testable, logical operator overloads, etc.
Could we have only one decompress/compress function each, for now?
& maybe leave out the name/enum from the base class for now, add it in later (& I think I mentionted in another comment those properties can be non-virtual, maybe even direct const members - passed into the base through the ctor from the derived class)

Maybe it's easier if I either post a patch, or at least more explicitly flesh out what I'm picturing/proposing/suggesting:
Header:

struct CompressionAlgorithm {
  virtual void Compress(...);
  virtual void Decompress(...);
};
enum class CompressionType {
  Zlib, Zstd
};
CompressionAlgorithm *getCompressionAlgorithm(CompressionType);

Implementation:

#if LLVM_ENABLE_ZLIB
struct ZlibCompressionAlgorthim : CompressionAlgorithm {
  void Compress(...) { ... }
  void Decompress(...) { ...}
}
#endif
...
CompressionAlgorithm *getCompressionAlgorithm(CompressionType T) {
  switch (T) {
  case CompressionType::Zlib: {
#if LLVM_ENABLE_ZLIB
    static ZlibCompressionAlgorithm A;
    return &A;
#else
    break;
#endif
  }
...
  }
  return nullptr;
}

Usage:

if (CompressionAlgorithm *C = getCompressionAlgorithm(CompressionType::Zlib) {
  C->compress(...);
}

And, yeah, I think it'd be suitable to eventually add name, type, size/speed/default levels:

struct CompressionAlgorithm {
  const StringRef Name;
  const CompressionType Type;
  const int DefaultLevel;
  const int BestSizeLevel;
  const int BestSpeedLevel;
  virtual void Compress(...);
  virtual void Decompress(...);
protected:
  CompressionAlgorithm(StringRef Name, CompressionType Type, ...) : Name(Name), Type(Type), ... {}
}

struct ZlibCompressionAlgorithm : CompressionAlgorithm {
  ZlibCompressionAlgorithm() : CompressionAlgorithm("zlib", CompressionType::Zlib, 5, 10, 1) { }
  /* as before */
};
...

Though those can be added as needed - good to keep in mind that they're a useful direction to go, but might simplify the review/discussion to omit them for now.

I think I have worked out something that is the best of both worlds:

I think @MaskRay's main concern, which I share to a degree, is that there's a lot of code/complexity here that doesn't currently seem warranted by the size of the problem. So adding more implementation complexity to this patch, even if it does provide some of the benefits (though I don't think the ability to do boolean logic, etc, is the main concern either myself or @MaskRay have - either way we'll have the enum) it's adding a lot more implementation complexity, which is something we're trying to address/reduce.

In D130516#3694366, @dblaikie wrote:

In D130516#3694151, @MaskRay wrote:

In D130516#3688236, @dblaikie wrote:

In D130516#3688123, @MaskRay wrote:

I'd like to make a few arguments for the current namespace+free function design, as opposed to the class+member function design as explored in this patch (but thanks for the exploration!).
Let's discuss several use cases.

(a) if a use case just calls compress/uncompress. The class design has slightly more boilerplate as it needs to get the algorithm class, a new instance, or a singleton instance.
For each new use, the number of lines may not differ, but the involvement of a a static class member or an instance make the reader wonder whether the object will be reused or thrown away.
There is some slight cognitive burden.
The class design has a non-trivial one-shot cost to have a function returning the singleton instance.

Though there must've been a condition that dominates this use somewhere - I'd suggest that condition could be where the algorithm is retrieved, and then passed to this code to use unconditionally.

If the algorithm object is const and raw pointers/references are used, I think it makes it clear to the reader that there's no ownership here, and it's not stateful when compressing/decompressing.

A pointer to a singleton compression class is isomorphic to an enum class CompressionType variable.

I don't mean to suggest that either design is fundamentally more or less functional - I'm totally OK with/agree that both design directions allow the implementation of all the desired final/end-user-visible functionality.

I'm trying to make a point about which, I think, achieves that goal in a "better" way - that's the space of design discussions, I think - what kinds of (developer, maintenance, etc) costs different designs incur.

[...]

I don't understand what you're saying here. Could you rephrase/expand a bit?

Maybe it's easier if I either post a patch, or at least more explicitly flesh out what I'm picturing/proposing/suggesting:

I agree that it is easier if you post a patch so that we can have discussion on concrete code. I think we are now at the "a picture is worth a thousand words" stage:)

In D130516#3694422, @dblaikie wrote:

The current code here still seems more complicated than I'd prefer - looks like currently the size/speed/default levels are currently unused, so maybe we can omit those for now, knowing they will be added?
And the CompressionKind with all its operator overloads seems like a lot of surface area that is pretty non-obvious for usage - boolean testable, logical operator overloads, etc.
Could we have only one decompress/compress function each, for now?
& maybe leave out the name/enum from the base class for now, add it in later (& I think I mentionted in another comment those properties can be non-virtual, maybe even direct const members - passed into the base through the ctor from the derived class)

Yes, I can continue to trim down the implementation! I agree with your sentiment.

Maybe it's easier if I either post a patch, or at least more explicitly flesh out what I'm picturing/proposing/suggesting:
Header:

struct CompressionAlgorithm {
  virtual void Compress(...);
  virtual void Decompress(...);
};
enum class CompressionType {
  Zlib, Zstd
};
CompressionAlgorithm *getCompressionAlgorithm(CompressionType);

Implementation:

#if LLVM_ENABLE_ZLIB
struct ZlibCompressionAlgorthim : CompressionAlgorithm {
  void Compress(...) { ... }
  void Decompress(...) { ...}
}
#endif
...
CompressionAlgorithm *getCompressionAlgorithm(CompressionType T) {
  switch (T) {
  case CompressionType::Zlib: {
#if LLVM_ENABLE_ZLIB
    static ZlibCompressionAlgorithm A;
    return &A;
#else
    break;
#endif
  }
...
  }
  return nullptr;
}

I agree with some of this, I have some strong thoughts I would like to work out about the whole nullptr being none or unsupported a little preemptively IMO.

Usage:
if (CompressionAlgorithm *C = getCompressionAlgorithm(CompressionType::Zlib) {
C->compress(...);
}

currently, you can do

if (CompressionKind C = CompressionKind::Zlib) {
  C->compress(...);
}

And, yeah, I think it'd be suitable to eventually add name, type, size/speed/default levels:
struct CompressionAlgorithm {
  const StringRef Name;
  const CompressionType Type;
  const int DefaultLevel;
  const int BestSizeLevel;
  const int BestSpeedLevel;
  virtual void Compress(...);
  virtual void Decompress(...);
protected:
  CompressionAlgorithm(StringRef Name, CompressionType Type, ...) : Name(Name), Type(Type), ... {}
}
struct ZlibCompressionAlgorithm : CompressionAlgorithm {
  ZlibCompressionAlgorithm() : CompressionAlgorithm("zlib", CompressionType::Zlib, 5, 10, 1) { }
  /* as before */
};
...
Though those can be added as needed - good to keep in mind that they're a useful direction to go, but might simplify the review/discussion to omit them for now.

Harbormaster completed remote builds in B178814: Diff 449375.Aug 2 2022, 12:47 PM

compression api: greatly simplify implementation
+ cuts around 100 lines of code from compression.h and compression.cpp

ckissane edited the summary of this revision. (Show Details)Aug 2 2022, 1:54 PM

Harbormaster completed remote builds in B178855: Diff 449421.Aug 2 2022, 2:55 PM

Yes, I can continue to trim down the implementation! I agree with your sentiment.

Thanks! This update helps - though I think we'll still want to further isolate the different pieces of this change/reduce this further.

I agree with some of this, I have some strong thoughts I would like to work out about the whole nullptr being none or unsupported a little preemptively IMO.

Could you clarify what use cases you have in mind that require the nuance between none and unsupported? (arguably this accessor function could assert when passed None - would that make it simpler? Then the only null return would be unsupported)

Usage:
if (CompressionAlgorithm *C = getCompressionAlgorithm(CompressionType::Zlib) {
C->compress(...);
}
currently, you can do
if (CompressionKind C = CompressionKind::Zlib) {
  C->compress(...);
}

The implementation complexity is a concern too, though. I think having CompressionKind, boolean conversions and logical operator overloads, etc, in addition to the CompressionAlgorithm doesn't seem to provide enough to justify the complexity - but perhaps I'm missing some context/understanding of the values those features provide?

What sort of use cases do you have in mind that necessitate that complexity/functionality? (specifically I see a lot of || llvm::NoneType() which seems really obtuse/unclear why a user of the API would think to do that/understand that was the right/necessary thing to do)

Maybe for comparison purposes it'd be good not to replace this API but to add it on top of the underlying API so only one callsite can be updated, rather than all the changes necessary to update all clients in one go (& in this way maybe omit some of the functionality in this first patch, since it won't have to cover all use cases - eg: those extra fields (name/compression levels (best size/speed/default), etc) could possibly be omitted from the first version of this patch, so the patch only adds enough functionality for one of the compresion clients (like MC, to match/compare with @MaskRay's patch) rather than all of them)

llvm/include/llvm/Support/Compression.h
48–51	Could we skip this wrapper & just have the underlying function (& also we shouldn't be overloading by case like this anyway - please name all the decompress/compress functions with the same case/spelling)

ckissane added inline comments.Aug 3 2022, 11:20 AM

llvm/include/llvm/Support/Compression.h
48–51	good point

remove uppercase Compress, Decompress

Harbormaster completed remote builds in B179114: Diff 449772.Aug 3 2022, 2:59 PM

remove compression kind || && overload

ckissane edited the summary of this revision. (Show Details)Aug 4 2022, 12:08 PM

Harbormaster completed remote builds in B179361: Diff 450093.Aug 4 2022, 1:58 PM

@dblaikie @MaskRay I would like it if you could all take another look.

In response to @dblaikie 's comments about implementation weight I have greatly simplified the implementation, including removing extra capitalized function overloads (Compress, Decompress), and removing || and && operator overrides. Also adopting the recently suggested class impl in part.

ckissane edited the summary of this revision. (Show Details)Aug 4 2022, 2:56 PM

ckissane edited the summary of this revision. (Show Details)

fix some nits in Compression.h

move if into switch

leonardchan added inline comments.Aug 4 2022, 5:39 PM

clang-tools-extra/clangd/index/Serialization.cpp
251–252	nit: I think `error` accepts format-like arguments, so you could have something similar to above with: return error("Compressed string table, but {0} is unavailable", CompressionScheme->Name);
clang/lib/Serialization/ASTReader.cpp
1469–1471	https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements
1475–1476	I think this `Error` takes a StringRef, so I think you could have: return Error("compression class " + CompressionScheme->Name + " is not available");
lld/ELF/InputSection.cpp
1234	`typename` might not be needed here
llvm/include/llvm/Support/Compression.h
28–112	https://llvm.org/docs/CodingStandards.html#use-of-class-and-struct-keywords since the ctor is non-public
40	`this->` might not be needed here
86	I think this cast might not be needed
106	I think this cast might not be needed
llvm/lib/ObjCopy/ELF/ELFObject.cpp
442–461	What's the explanation for having the `llvm_unreachable` branch and getting the compression type? I would've thought this section would just be: if (Error Err1 = compression::CompressionKind::Zlib->decompress( Compressed, DecompressedContent, static_cast<size_t>(Sec.Size))) { return createStringError(errc::invalid_argument, "'" + Sec.Name + "': " + toString(std::move(Err1))); } which looks like it would have identical behavior to the old code.
530–537	Same here. Should this just be `compression::CompressionKind::Zlib->compress(OriginalData, CompressedData);`? If this is in preparation for the ELF+zstd changes, perhaps we should save those for another patch once that lands?
llvm/lib/Support/Compression.cpp
53–79	nit: add `override`s to be more explicit these are virtual methods
83–95	If the `llvm_unreachable`s should be the default implementation for all subclasses, perhaps the `[de]compress` methods should be regular virtual with these default implementations rather than abstract virtual.
152	I think this cast might not be needed
152–155	Same here
179–182	https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements or perhaps just return left ? left : NoneType();

Harbormaster completed remote builds in B179428: Diff 450176.Aug 4 2022, 5:49 PM

dblaikie added inline comments.Aug 4 2022, 11:47 PM

llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp
55–61	This still seems like a lot of hoops to jump through - why "noneIfUnsupported" rather than either having the compression scheme (I think it could be the CompressionAlgorithm itself, rather than having the separate OptionalCompressionKind abstraction) either be null itself, or expose an "isAvailable" operation directly on the CompressionAlgorithm? Even if the CompressionKind/OptionalCompressionKind/CompressionAlgorithm abstractions are kept, I'm not sure why the above code is preferred over, say: if (Compress && DoInstrProfNameCompression && OptionalCompressionScheme /* .isAvailable(), if we want to be more explicit */) { ... } What's the benefit that `noneIfUnsupported` is providing? (& generally I'd expect the `Compress && DoInstrProfNameCompression` to be tested/exit early before even naming/constructing/querying/doing anything with the compression scheme/algorithm/etc - so there'd be no need to combine the tests for availability and the tests for whether compression was requested) Perhaps this API is motivated by a desire to implement something much closer to the original code than is necessary/suitable? Or some other use case/benefit I'm not quite understanding yet?

ckissane added inline comments.Aug 5 2022, 11:09 AM

llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp
55–61	I shall remove `noneIfUnsupported`. You express good points, we can simply check `if(OptionalCompressionScheme && *OptionalCompressionScheme)` where necessary.

ckissane added inline comments.Aug 5 2022, 11:11 AM

llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp
55–61	though it will make a lot of existing code patterns less clear, and more verbose

ckissane added inline comments.Aug 5 2022, 11:13 AM

llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp
55–61	and sometimes you really do need to re code the exact thing `noneIfUnsupported` encapsulates...

dblaikie added inline comments.Aug 5 2022, 11:21 AM

llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp
55–61	Are there examples within LLVM that you can show compare/contrast `noneIfUnsupported` helps?

cleanup some compression nits

ckissane added inline comments.Aug 5 2022, 11:36 AM

llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp
55–61	yes, I'll paste a couple here

ckissane added inline comments.Aug 5 2022, 11:59 AM

llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp
55–61	Ok, So I believe I was mistaken. In older versions of this patch there was a none compression implementation that just did a memcpy, this made a natural fall through state, which made this type of pattern advantageous. However, this is no longer the case. Hence I will remove this without further adue. Thank you for your astute observation!

remove OptionalCompressionKind noneIfUnsupported

ckissane added inline comments.Aug 5 2022, 12:06 PM

clang/lib/Serialization/ASTReader.cpp
1475–1476	unfortunately that doesn't work (I tried)

format error string

Harbormaster completed remote builds in B179558: Diff 450345.Aug 5 2022, 12:50 PM

fix InputSection decompress issue

Merge remote-tracking branch 'origin/main' into ckissane.compression-class-simple

Harbormaster completed remote builds in B179600: Diff 450401.Aug 5 2022, 2:44 PM

This is looking pretty close to what I've been picturing - the only thing remaining is that I think we could get rid of CompressionKind and access the CompressionAlgorithm directly - basically the contents of CompressionKind::operator-> could be a free/public function const CompressionAlgorithm &getCompressionAlgorithm(enum class CompressionKind); - and have it return a reference to the local static implementation, with a none implementation (for those profile cases where you want to pass in an algorithm if it's available, or none) and one implementation for each of zlib/zstd?

clang-tools-extra/clangd/index/Serialization.cpp
238	What purpose is this expression serving? (isn't it a tautology, given that `CompressionScheme` was initialized a couple of lines back with `CompressionKind::Zlib`?
clang/lib/Serialization/ASTWriter.cpp
2003–2008	Generally LLVM style rolls these together
llvm/lib/Object/Decompressor.cpp
41–50	Maybe leave this code more like it was before - it can turn into a switch over `ELFCompressionSchemeId` when Zstd is added here.
llvm/lib/ProfileData/Coverage/CoverageMappingReader.cpp
122–123	Be nice to share the same CompressionKind
llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp
52–57	Why/how does `OptionalCompressionKind` end up in here, compared with this (suggested edit)?
llvm/lib/ProfileData/InstrProf.cpp
465	Yeah, this seems awkward, but I see what you're getting at - if you're going to pass around an algorithm to use, and this particular kind of use case wants to collapse the "algorithm not available" and "no compression was requested" - so, yeah, for this sort of use case I can see the valid in having a null compression algorithm implementation. That shouldn't add a lot of complexity to the implementation and simplify the usage so it doesn't have two layers of "not present" state.
494	Presumably this doesn't need to test `OptionalCompressionScheme` here, though, since it's tested in the implementation?
llvm/lib/ProfileData/SampleProfReader.cpp
880–883	Nice to pull out a common variable rather than accessing `CompressionKind::Zlib` twice independently (otherwise we probably might as well go with the more direct API @MaskRay has proposed)

address review comments

In D130516#3703691, @dblaikie wrote:

This is looking pretty close to what I've been picturing - the only thing remaining is that I think we could get rid of CompressionKind and access the CompressionAlgorithm directly - basically the contents of CompressionKind::operator-> could be a free/public function const CompressionAlgorithm &getCompressionAlgorithm(enum class CompressionKind); - and have it return a reference to the local static implementation, with a none implementation (for those profile cases where you want to pass in an algorithm if it's available, or none) and one implementation for each of zlib/zstd?

I can see what you are asking for, however since its behavior is essentially the same, and still uses both enum values and class implementations, I don't see any practical advantages (though if there is something I am failing to observe let me know).
Additionally, I can see some disadvantages in largely increasing code verbosity across the codebase.
A snippet of an example from elf: if(CompressionKind::Zlib) CompressionKind::Zlib->compress...
would turn into if(getCompressionAlgorithm(CompressionKind::Zlib)) getCompressionAlgorithm(CompressionKind::Zlib)->compress...
(assuming bool operator overload is also moved to class)
Because of this I am leaning away from such an implementation change.

Harbormaster completed remote builds in B180236: Diff 451247.Aug 9 2022, 2:12 PM

I have only taken very brief look at the new version. Having an enum class CompressionKind with a parallel CompressionAlgorithm seems redundant.
friend CompressionAlgorithm *CompressionKind::operator->() const; looks magical.

I hope that someone insisting on object-oriented design can put up a version with less boilerplate to compete with D130506.

In D130516#3710903, @ckissane wrote:

In D130516#3703691, @dblaikie wrote:

This is looking pretty close to what I've been picturing - the only thing remaining is that I think we could get rid of CompressionKind and access the CompressionAlgorithm directly - basically the contents of CompressionKind::operator-> could be a free/public function const CompressionAlgorithm &getCompressionAlgorithm(enum class CompressionKind); - and have it return a reference to the local static implementation, with a none implementation (for those profile cases where you want to pass in an algorithm if it's available, or none) and one implementation for each of zlib/zstd?

I can see what you are asking for, however since its behavior is essentially the same, and still uses both enum values and class implementations, I don't see any practical advantages (though if there is something I am failing to observe let me know).

The intent is to simplify both implementation and usage because this currently feels like overkill for a fairly small abstraction benefit (compared to @MaskRay's posted alternative, for instance) - we're abstracting only a handful of use cases over only 2 implementations, so this shouldn't be too complicated/overengineered.

Additionally, I can see some disadvantages in largely increasing code verbosity across the codebase.
A snippet of an example from elf: if(CompressionKind::Zlib) CompressionKind::Zlib->compress...
would turn into if(getCompressionAlgorithm(CompressionKind::Zlib)) getCompressionAlgorithm(CompressionKind::Zlib)->compress...

I think in either case this should be changed to something like this (specifically avoid writing CompressionKind::Zlib twice):

if (auto C = CompressionKind::Zlib)
  C->compress...

(assuming bool operator overload is also moved to class)
Because of this I am leaning away from such an implementation change.

In D130516#3710972, @MaskRay wrote:

I have only taken very brief look at the new version. Having an enum class CompressionKind with a parallel CompressionAlgorithm seems redundant.
friend CompressionAlgorithm *CompressionKind::operator->() const; looks magical.

I hope that someone insisting on object-oriented design can put up a version with less boilerplate to compete with D130506.

Posted something more comparable to D130506 in D131638 - hard to compare, though - D130506 is additive, whereas D131638 and this D130516 are more replacements - though a different with my change there is at least for the initial patch leaving the old APIs in place, with the intent to incrementally change the usages until the old API can be removed.

MaskRay mentioned this in D131638: Alternative compression API design for illustration.Aug 10 2022, 6:17 PM

ckissane mentioned this in D131992: [Support] compression proposal for a enum->spec->impl approach.Aug 16 2022, 2:02 PM

Revision Contents

Path

Size

clang-tools-extra/

clangd/

index/

Serialization.cpp

49 lines

unittests/

SerializationTests.cpp

3 lines

clang/

lib/

Driver/

ToolChains/

Clang.cpp

2 lines

Serialization/

ASTReader.cpp

9 lines

ASTWriter.cpp

10 lines

lld/

ELF/

Driver.cpp

12 lines

InputSection.cpp

8 lines

llvm/

include/

llvm/

Object/

Decompressor.h

5 lines

ProfileData/

InstrProf.h

7 lines

Support/

Compression.h

130 lines

lib/

MC/

ELFObjectWriter.cpp

9 lines

ObjCopy/

ELF/

ELFObject.cpp

31 lines

Object/

Decompressor.cpp

21 lines

ProfileData/

Coverage/

CoverageMappingReader.cpp

44 lines

CoverageMappingWriter.cpp

20 lines

InstrProf.cpp

24 lines

InstrProfCorrelator.cpp

5 lines

SampleProfReader.cpp

22 lines

SampleProfWriter.cpp

8 lines

Support/

Compression.cpp

241 lines

tools/

llvm-mc/

llvm-mc.cpp

2 lines

llvm-objcopy/

ObjcopyOptions.cpp

16 lines

unittests/

ProfileData/

InstrProfTest.cpp

28 lines

Support/

CompressionTest.cpp

73 lines

Commit	Tree	Parents	Author	Summary	Date
032b75331c1b	3ca6e7dff977	922a782cb44d	Cole Kissane	adress review comments	Aug 9 2022, 12:58 PM
922a782cb44d	b2f2e0712abc	e846120f6c9a 3fa291fa925d	Cole Kissane	Merge remote-tracking branch 'origin/main' into ckissane.compression-class… (Show More…)	Aug 5 2022, 2:04 PM
e846120f6c9a	cd5a67f3680b	f59b25b9405b	Cole Kissane	fix InputSection decompress issue	Aug 5 2022, 1:59 PM
f59b25b9405b	854c920d259a	a760d56c773f	Cole Kissane	format error string	Aug 5 2022, 12:08 PM
a760d56c773f	48069c4e1e87	2424a485146c	Cole Kissane	remove OptionalCompressionKind noneIfUnsupported	Aug 5 2022, 12:03 PM
2424a485146c	7d5d57140a6c	617684b226d0	Cole Kissane	cleanup some compression nits	Aug 5 2022, 11:32 AM
617684b226d0	55f841f101f0	4a22ee8acf1c	Cole	move if into switch	Aug 4 2022, 4:23 PM
4a22ee8acf1c	e00fbf4f0c19	aa20bd806821	Cole	fix some nits in Compression.h	Aug 4 2022, 4:12 PM
aa20bd806821	6073d2efe836	2ea3c3e5a733	Cole	remove compression kind \|\| && overload	Aug 4 2022, 12:04 PM
2ea3c3e5a733	22a49698d926	51d60c9fdbb7	Cole	remove uppercase Compress, Decompress	Aug 3 2022, 1:23 PM
51d60c9fdbb7	bed93c4ea3ee	f6be2d05584c	Cole Kissane	compression api: greatly simplify implementation	Aug 2 2022, 1:51 PM
f6be2d05584c	be70ba377a97	56d6f5535796 20f7f9b709df	Cole Kissane	Merge remote-tracking branch 'origin/main' into ckissane.compression-class… (Show More…)	Aug 2 2022, 12:30 PM
56d6f5535796	9cbd1ba838e5	3ae4742a287b	Cole Kissane	CompressionKind: clean up param names to == op	Aug 2 2022, 12:11 PM
3ae4742a287b	84b6af082d14	178325e0bef6	Cole Kissane	trim down compression api: remove supported()	Aug 2 2022, 11:42 AM
178325e0bef6	024b4e086774	d9137066cb63	Cole Kissane	make a zlib corruption check specific	Aug 2 2022, 11:21 AM
d9137066cb63	ae6874829116	41e2519fed58	Cole Kissane	feat compression "enum" with methods	Aug 2 2022, 10:44 AM
41e2519fed58	1908243f57db	1d44c64af1fe 8e51917b39cd	Cole	Merge remote-tracking branch 'origin/main' into ckissane.compression-class… (Show More…)	Aug 1 2022, 10:52 AM
1d44c64af1fe	862e883fe20b	a3dec6971f38	Cole Kissane	fix usage of CompressionAlgorithmFromId	Jul 28 2022, 4:29 PM
a3dec6971f38	b00574d12112	5f2689ab93b7	Cole Kissane	make compression singletons	Jul 28 2022, 4:14 PM
5f2689ab93b7	dbec25a5c26d	4ce555bb77f9	Cole Kissane	use CompressionScheme->notNone() in InstrProf	Jul 27 2022, 3:05 PM
4ce555bb77f9	a9619507e969	b02e51d5b6d9	Cole Kissane	fix compression inheritence and some more compression class helpers	Jul 27 2022, 2:41 PM
b02e51d5b6d9	729847f5902e	44a3e23d8d96 68901fdbebb7	Cole Kissane	Merge remote-tracking branch 'origin/main' into ckissane.compression-class… (Show More…)	Jul 27 2022, 1:40 PM
44a3e23d8d96	394db5fb3696	ef97668da245 41e776c72c58	Cole Kissane	merge fix	Jul 25 2022, 2:28 PM
ef97668da245	394db5fb3696	46cc0ef2391c	Cole Kissane	format	Jul 25 2022, 2:26 PM
46cc0ef2391c	7f433b9157a2	62531518f989	Cole Kissane	[Support] compression classes (Show More…)	Jul 25 2022, 2:23 PM
41e776c72c58	7f433b9157a2	62531518f989	Cole Kissane	[Support] compression classes	Jul 25 2022, 2:23 PM

Diff 451247

clang-tools-extra/clangd/index/Serialization.cpp

Show All 10 Lines
#include "RIFF.h"		#include "RIFF.h"
#include "index/MemIndex.h"		#include "index/MemIndex.h"
#include "index/SymbolLocation.h"		#include "index/SymbolLocation.h"
#include "index/SymbolOrigin.h"		#include "index/SymbolOrigin.h"
#include "index/dex/Dex.h"		#include "index/dex/Dex.h"
#include "support/Logger.h"		#include "support/Logger.h"
#include "support/Trace.h"		#include "support/Trace.h"
#include "clang/Tooling/CompilationDatabase.h"		#include "clang/Tooling/CompilationDatabase.h"
		#include "llvm/ADT/Optional.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/Support/Compiler.h"		#include "llvm/Support/Compiler.h"
#include "llvm/Support/Compression.h"		#include "llvm/Support/Compression.h"
#include "llvm/Support/Endian.h"		#include "llvm/Support/Endian.h"
#include "llvm/Support/Error.h"		#include "llvm/Support/Error.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include <cstdint>		#include <cstdint>
#include <vector>		#include <vector>

		using namespace llvm::compression;

namespace clang {		namespace clang {
namespace clangd {		namespace clangd {
namespace {		namespace {

// IO PRIMITIVES		// IO PRIMITIVES
		dblaikieUnsubmitted Not Done Reply Inline Actions We're generally trying to avoid global ctors in LLVM. So at most this should be a static local variable in a function that accesses the algorithm (though perhaps these "compression algorithm" classes shouldn't be accessible directly, and only through singleton accessors in the defined alongside - like there's no reason for LLVM to contain more than one instance of ZlibCompressionAlgorithm, I think?) dblaikie: We're generally trying to avoid global ctors in LLVM. So at most this should be a static local…
		ckissaneAuthorUnsubmitted Done Reply Inline Actions your idea seems correct to me, however some algorithms, such as zstd, support running in multiple threads, so this might influence our decision. ckissane: your idea seems correct to me, however some algorithms, such as zstd, support running in…
		dblaikieUnsubmitted Not Done Reply Inline Actions So long as the compression algorithm objects are stateless, this would be acceptable - such objects would be thread safe for multiple concurrent users. dblaikie: So long as the compression algorithm objects are stateless, this would be acceptable - such…
// We use little-endian 32 bit ints, sometimes with variable-length encoding.		// We use little-endian 32 bit ints, sometimes with variable-length encoding.
//		//
// Variable-length int encoding (varint) uses the bottom 7 bits of each byte		// Variable-length int encoding (varint) uses the bottom 7 bits of each byte
// to encode the number, and the top bit to indicate whether more bytes follow.		// to encode the number, and the top bit to indicate whether more bytes follow.
// e.g. 9a 2f means [0x1a and keep reading, 0x2f and stop].		// e.g. 9a 2f means [0x1a and keep reading, 0x2f and stop].
// This represents 0x1a \| 0x2f<<7 = 6042.		// This represents 0x1a \| 0x2f<<7 = 6042.
// A 32-bit integer takes 1-5 bytes to encode; small numbers are more compact.		// A 32-bit integer takes 1-5 bytes to encode; small numbers are more compact.

▲ Show 20 Lines • Show All 144 Lines • ▼ Show 20 Lines	void finalize(llvm::raw_ostream &OS) {
for (unsigned I = 0; I < Sorted.size(); ++I)		for (unsigned I = 0; I < Sorted.size(); ++I)
Index.try_emplace({Sorted[I].data(), Sorted[I].size()}, I);		Index.try_emplace({Sorted[I].data(), Sorted[I].size()}, I);

std::string RawTable;		std::string RawTable;
for (llvm::StringRef S : Sorted) {		for (llvm::StringRef S : Sorted) {
RawTable.append(std::string(S));		RawTable.append(std::string(S));
RawTable.push_back(0);		RawTable.push_back(0);
}		}
if (llvm::compression::zlib::isAvailable()) {		if (CompressionKind::Zlib) {
llvm::SmallVector<uint8_t, 0> Compressed;		llvm::SmallVector<uint8_t, 0> Compressed;
		leonardchanUnsubmitted Not Done Reply Inline Actions Will this leak? leonardchan: Will this leak?
llvm::compression::zlib::compress(llvm::arrayRefFromStringRef(RawTable),		CompressionKind::Zlib->compress(llvm::arrayRefFromStringRef(RawTable),
Compressed);		Compressed);
write32(RawTable.size(), OS);		write32(RawTable.size(), OS);
OS << llvm::toStringRef(Compressed);		OS << llvm::toStringRef(Compressed);
} else {		} else {
write32(0, OS); // No compression.		write32(0, OS); // No compression.
OS << RawTable;		OS << RawTable;
}		}
}		}
// Get the ID of an string, which must be interned. Table must be finalized.		// Get the ID of an string, which must be interned. Table must be finalized.
Show All 14 Lines	llvm::Expected<StringTableIn> readStringTable(llvm::StringRef Data) {
size_t UncompressedSize = R.consume32();		size_t UncompressedSize = R.consume32();
if (R.err())		if (R.err())
return error("Truncated string table");		return error("Truncated string table");

llvm::StringRef Uncompressed;		llvm::StringRef Uncompressed;
llvm::SmallVector<uint8_t, 0> UncompressedStorage;		llvm::SmallVector<uint8_t, 0> UncompressedStorage;
if (UncompressedSize == 0) // No compression		if (UncompressedSize == 0) // No compression
Uncompressed = R.rest();		Uncompressed = R.rest();
else if (llvm::compression::zlib::isAvailable()) {		else {
		// Don't extratc to a CompressionKind CompressionScheme variable
		// as ratio check is zlib specific
		if (CompressionKind::Zlib) {
// Don't allocate a massive buffer if UncompressedSize was corrupted		// Don't allocate a massive buffer if UncompressedSize was corrupted
// This is effective for sharded index, but not big monolithic ones, as		// This is effective for sharded index, but not big monolithic ones, as
// once compressed size reaches 4MB nothing can be ruled out.		// once compressed size reaches 4MB nothing can be ruled out.
// Theoretical max ratio from https://zlib.net/zlib_tech.html		// Theoretical max ratio from https://zlib.net/zlib_tech.html
constexpr int MaxCompressionRatio = 1032;		constexpr int MaxCompressionRatio = 1032;
		dblaikieUnsubmitted Not Done Reply Inline Actions What purpose is this expression serving? (isn't it a tautology, given that `CompressionScheme` was initialized a couple of lines back with `CompressionKind::Zlib`? dblaikie: What purpose is this expression serving? (isn't it a tautology, given that `CompressionScheme`…
if (UncompressedSize / MaxCompressionRatio > R.rest().size())		if (UncompressedSize / MaxCompressionRatio > R.rest().size())
return error("Bad stri table: uncompress {0} -> {1} bytes is implausible",		return error(
		"Bad stri table: uncompress {0} -> {1} bytes is implausible",
R.rest().size(), UncompressedSize);		R.rest().size(), UncompressedSize);

if (llvm::Error E = llvm::compression::zlib::uncompress(		if (llvm::Error E = CompressionKind::Zlib->decompress(
llvm::arrayRefFromStringRef(R.rest()), UncompressedStorage,		llvm::arrayRefFromStringRef(R.rest()), UncompressedStorage,
UncompressedSize))		UncompressedSize))
return std::move(E);		return std::move(E);
Uncompressed = toStringRef(UncompressedStorage);		Uncompressed = toStringRef(UncompressedStorage);
} else		} else
return error("Compressed string table, but zlib is unavailable");		return error("Compressed string table, but {0} is unavailable",
		CompressionKind::Zlib->Name);
		}
		leonardchanUnsubmitted Not Done Reply Inline Actions nit: I think `error` accepts format-like arguments, so you could have something similar to above with: return error("Compressed string table, but {0} is unavailable", CompressionScheme->Name); leonardchan: nit: I think `error` accepts format-like arguments, so you could have something similar to…

StringTableIn Table;		StringTableIn Table;
llvm::StringSaver Saver(Table.Arena);		llvm::StringSaver Saver(Table.Arena);
R = Reader(Uncompressed);		R = Reader(Uncompressed);
for (Reader R(Uncompressed); !R.eof();) {		for (Reader R(Uncompressed); !R.eof();) {
auto Len = R.rest().find(0);		auto Len = R.rest().find(0);
if (Len == llvm::StringRef::npos)		if (Len == llvm::StringRef::npos)
return error("Bad string table: not null terminated");		return error("Bad string table: not null terminated");
▲ Show 20 Lines • Show All 499 Lines • Show Last 20 Lines

clang-tools-extra/clangd/unittests/SerializationTests.cpp

Show All 19 Lines
#ifdef LLVM_ON_UNIX		#ifdef LLVM_ON_UNIX
#include <sys/resource.h>		#include <sys/resource.h>
#endif		#endif

using ::testing::ElementsAre;		using ::testing::ElementsAre;
using ::testing::Pair;		using ::testing::Pair;
using ::testing::UnorderedElementsAre;		using ::testing::UnorderedElementsAre;
using ::testing::UnorderedElementsAreArray;		using ::testing::UnorderedElementsAreArray;
		using namespace llvm::compression;

namespace clang {		namespace clang {
namespace clangd {		namespace clangd {
namespace {		namespace {

const char *YAML = R"(		const char *YAML = R"(
---		---
!Symbol		!Symbol
▲ Show 20 Lines • Show All 350 Lines • ▼ Show 20 Lines	TEST(SerializationTest, NoCrashOnBadArraySize) {
ASSERT_TRUE(!CorruptParsed);		ASSERT_TRUE(!CorruptParsed);
EXPECT_EQ(llvm::toString(CorruptParsed.takeError()),		EXPECT_EQ(llvm::toString(CorruptParsed.takeError()),
"malformed or truncated include uri");		"malformed or truncated include uri");
}		}

// Check we detect invalid string table size size without allocating it first.		// Check we detect invalid string table size size without allocating it first.
// If this detection fails, the test should allocate a huge array and crash.		// If this detection fails, the test should allocate a huge array and crash.
TEST(SerializationTest, NoCrashOnBadStringTableSize) {		TEST(SerializationTest, NoCrashOnBadStringTableSize) {
if (!llvm::compression::zlib::isAvailable()) {		if (!CompressionKind::Zlib) {
log("skipping test, no zlib");		log("skipping test, no zlib");
return;		return;
}		}

// First, create a valid serialized file.		// First, create a valid serialized file.
auto In = readIndexFile(YAML);		auto In = readIndexFile(YAML);
ASSERT_FALSE(!In) << In.takeError();		ASSERT_FALSE(!In) << In.takeError();
IndexFileOut Out(*In);		IndexFileOut Out(*In);
Show All 31 Lines

clang/lib/Driver/ToolChains/Clang.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,133 Lines • ▼ Show 20 Lines	static void RenderDebugInfoCompressionArgs(const ArgList &Args,
const Arg *A = Args.getLastArg(options::OPT_gz_EQ);		const Arg *A = Args.getLastArg(options::OPT_gz_EQ);
if (!A)		if (!A)
return;		return;
if (checkDebugInfoOption(A, Args, D, TC)) {		if (checkDebugInfoOption(A, Args, D, TC)) {
StringRef Value = A->getValue();		StringRef Value = A->getValue();
if (Value == "none") {		if (Value == "none") {
CmdArgs.push_back("--compress-debug-sections=none");		CmdArgs.push_back("--compress-debug-sections=none");
} else if (Value == "zlib") {		} else if (Value == "zlib") {
if (llvm::compression::zlib::isAvailable()) {		if (llvm::compression::CompressionKind::Zlib) {
CmdArgs.push_back(		CmdArgs.push_back(
Args.MakeArgString("--compress-debug-sections=" + Twine(Value)));		Args.MakeArgString("--compress-debug-sections=" + Twine(Value)));
} else {		} else {
D.Diag(diag::warn_debug_compression_unavailable);		D.Diag(diag::warn_debug_compression_unavailable);
}		}
} else {		} else {
D.Diag(diag::err_drv_unsupported_option_argument)		D.Diag(diag::err_drv_unsupported_option_argument)
<< A->getOption().getName() << Value;		<< A->getOption().getName() << Value;
▲ Show 20 Lines • Show All 7,377 Lines • Show Last 20 Lines

clang/lib/Serialization/ASTReader.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 138 Lines • ▼ Show 20 Lines
#include <tuple>		#include <tuple>
#include <utility>		#include <utility>
#include <vector>		#include <vector>

using namespace clang;		using namespace clang;
using namespace clang::serialization;		using namespace clang::serialization;
using namespace clang::serialization::reader;		using namespace clang::serialization::reader;
using llvm::BitstreamCursor;		using llvm::BitstreamCursor;
		using namespace llvm::compression;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// ChainedASTReaderListener implementation		// ChainedASTReaderListener implementation
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

bool		bool
ChainedASTReaderListener::ReadFullVersionInformation(StringRef FullVersion) {		ChainedASTReaderListener::ReadFullVersionInformation(StringRef FullVersion) {
return First->ReadFullVersionInformation(FullVersion) \|\|		return First->ReadFullVersionInformation(FullVersion) \|\|
▲ Show 20 Lines • Show All 1,302 Lines • ▼ Show 20 Lines	Expected<unsigned> MaybeRecCode =
SLocEntryCursor.readRecord(Code, Record, &Blob);		SLocEntryCursor.readRecord(Code, Record, &Blob);
if (!MaybeRecCode) {		if (!MaybeRecCode) {
Error(MaybeRecCode.takeError());		Error(MaybeRecCode.takeError());
return nullptr;		return nullptr;
}		}
unsigned RecCode = MaybeRecCode.get();		unsigned RecCode = MaybeRecCode.get();

if (RecCode == SM_SLOC_BUFFER_BLOB_COMPRESSED) {		if (RecCode == SM_SLOC_BUFFER_BLOB_COMPRESSED) {
if (!llvm::compression::zlib::isAvailable()) {		CompressionKind CompressionScheme = CompressionKind::Zlib;
Error("zlib is not available");		if (!CompressionScheme) {
		Error("compression class " +
		(CompressionScheme->Name + " is not available").str());
return nullptr;		return nullptr;
}		}
		leonardchanUnsubmitted Not Done Reply Inline Actions https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements leonardchan: https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies…
SmallVector<uint8_t, 0> Uncompressed;		SmallVector<uint8_t, 0> Uncompressed;
if (llvm::Error E = llvm::compression::zlib::uncompress(		if (llvm::Error E = CompressionScheme->decompress(
llvm::arrayRefFromStringRef(Blob), Uncompressed, Record[0])) {		llvm::arrayRefFromStringRef(Blob), Uncompressed, Record[0])) {
Error("could not decompress embedded file contents: " +		Error("could not decompress embedded file contents: " +
llvm::toString(std::move(E)));		llvm::toString(std::move(E)));
		leonardchanUnsubmitted Not Done Reply Inline Actions I think this `Error` takes a StringRef, so I think you could have: return Error("compression class " + CompressionScheme->Name + " is not available"); leonardchan: I think this `Error` takes a StringRef, so I think you could have: ``` return Error…
		ckissaneAuthorUnsubmitted Done Reply Inline Actions unfortunately that doesn't work (I tried) ckissane: unfortunately that doesn't work (I tried)
return nullptr;		return nullptr;
}		}
return llvm::MemoryBuffer::getMemBufferCopy(		return llvm::MemoryBuffer::getMemBufferCopy(
llvm::toStringRef(Uncompressed), Name);		llvm::toStringRef(Uncompressed), Name);
} else if (RecCode == SM_SLOC_BUFFER_BLOB) {		} else if (RecCode == SM_SLOC_BUFFER_BLOB) {
return llvm::MemoryBuffer::getMemBuffer(Blob.drop_back(1), Name, true);		return llvm::MemoryBuffer::getMemBuffer(Blob.drop_back(1), Name, true);
} else {		} else {
Error("AST record has invalid code");		Error("AST record has invalid code");
▲ Show 20 Lines • Show All 11,369 Lines • Show Last 20 Lines

clang/lib/Serialization/ASTWriter.cpp

Show First 20 Lines • Show All 112 Lines • ▼ Show 20 Lines

#include <memory> #include <memory>

#include <queue> #include <queue>

#include <tuple> #include <tuple>

#include <utility> #include <utility>

#include <vector> #include <vector>

using namespace clang; using namespace clang;

using namespace clang::serialization; using namespace clang::serialization;

using namespace llvm::compression;

template <typename T, typename Allocator> template <typename T, typename Allocator>

static StringRef bytes(const std::vector<T, Allocator> &v) { static StringRef bytes(const std::vector<T, Allocator> &v) {

if (v.empty()) return StringRef(); if (v.empty()) return StringRef();

return StringRef(reinterpret_cast<const char*>(&v[0]), return StringRef(reinterpret_cast<const char*>(&v[0]),

sizeof(T) * v.size()); sizeof(T) * v.size());

} }

▲ Show 20 Lines • Show All 1,865 Lines • ▼ Show 20 Lines

static void emitBlob(llvm::BitstreamWriter &Stream, StringRef Blob, static void emitBlob(llvm::BitstreamWriter &Stream, StringRef Blob,

unsigned SLocBufferBlobCompressedAbbrv, unsigned SLocBufferBlobCompressedAbbrv,

unsigned SLocBufferBlobAbbrv) { unsigned SLocBufferBlobAbbrv) {

using RecordDataType = ASTWriter::RecordData::value_type; using RecordDataType = ASTWriter::RecordData::value_type;

// Compress the buffer if possible. We expect that almost all PCM // Compress the buffer if possible. We expect that almost all PCM

// consumers will not want its contents. // consumers will not want its contents.

if (CompressionKind CompressionScheme = CompressionKind::Zlib) {

SmallVector<uint8_t, 0> CompressedBuffer; SmallVector<uint8_t, 0> CompressedBuffer;

dblaikieUnsubmitted

Done

Doesn't this cause slicing & end up with the base implementation?

(also the base class CompressionAlgorithm has no virtual functions, so I'm not sure how this is meant to work - does this code all work? Then I must be missing some things - how does this work?)

dblaikie: Doesn't this cause slicing & end up with the base implementation? (also the base class…

ckissaneAuthorUnsubmitted

Done

You are correct to observe that this patch does not fully pass around pointers to instances of the classes, however, because I don't pass pointers and the currently repetitive nature of the compression classes, this still functions correctly.
In short, a follow-up patch (which I will shortly upload) will convert this to using class instances and passing those around.
Including reworking functions throughout llvm-project to take advantage of this.
I am aiming to take this 2 step process to cut down on making an already large pass larger.
Let me know if you have any concerns or ideas.

ckissane: You are correct to observe that this patch does not fully pass around pointers to instances of…

dblaikieUnsubmitted

Done

But I'm not sure how this patch works correctly - wouldn't the line below (CompressionScheme.supported()) call CompressionAlgorithm::supported() which always returns false?

dblaikie: But I'm not sure how this patch works correctly - wouldn't the line below (`CompressionScheme.

ckissaneAuthorUnsubmitted

Done

good catch

ckissane: good catch

if (llvm::compression::zlib::isAvailable()) {

llvm::compression::zlib::compress( CompressionScheme->compress(llvm::arrayRefFromStringRef(Blob.drop_back(1)),

llvm::arrayRefFromStringRef(Blob.drop_back(1)), CompressedBuffer); CompressedBuffer);

RecordDataType Record[] = {SM_SLOC_BUFFER_BLOB_COMPRESSED, Blob.size() - 1}; RecordDataType Record[] = {SM_SLOC_BUFFER_BLOB_COMPRESSED, Blob.size() - 1};

dblaikieUnsubmitted

Not Done

// consumers will not want its contents.

- CompressionKind CompressionScheme = CompressionKind::Zlib;

- if (CompressionScheme) {

+ if (CompressionKind CompressionScheme = CompressionKind::Zlib) {

SmallVector<uint8_t, 0> CompressedBuffer;

Generally LLVM style rolls these together

dblaikie: Generally LLVM style rolls these together

Stream.EmitRecordWithBlob(SLocBufferBlobCompressedAbbrv, Record, Stream.EmitRecordWithBlob(SLocBufferBlobCompressedAbbrv, Record,

llvm::toStringRef(CompressedBuffer)); llvm::toStringRef(CompressedBuffer));

return; return;

} }

RecordDataType Record[] = {SM_SLOC_BUFFER_BLOB}; RecordDataType Record[] = {SM_SLOC_BUFFER_BLOB};

Stream.EmitRecordWithBlob(SLocBufferBlobAbbrv, Record, Blob); Stream.EmitRecordWithBlob(SLocBufferBlobAbbrv, Record, Blob);

} }

▲ Show 20 Lines • Show All 4,940 Lines • Show Last 20 Lines

lld/ELF/Driver.cpp

Show First 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
#include <cstdlib>		#include <cstdlib>
#include <utility>		#include <utility>

using namespace llvm;		using namespace llvm;
using namespace llvm::ELF;		using namespace llvm::ELF;
using namespace llvm::object;		using namespace llvm::object;
using namespace llvm::sys;		using namespace llvm::sys;
using namespace llvm::support;		using namespace llvm::support;
		using namespace llvm::compression;
using namespace lld;		using namespace lld;
using namespace lld::elf;		using namespace lld::elf;

std::unique_ptr<Configuration> elf::config;		std::unique_ptr<Configuration> elf::config;
std::unique_ptr<Ctx> elf::ctx;		std::unique_ptr<Ctx> elf::ctx;
std::unique_ptr<LinkerDriver> elf::driver;		std::unique_ptr<LinkerDriver> elf::driver;

static void setConfigs(opt::InputArgList &args);		static void setConfigs(opt::InputArgList &args);
▲ Show 20 Lines • Show All 868 Lines • ▼ Show 20 Lines	for (uint32_t i = 0, size = cgProfile.size(); i < size; ++i) {
if (from && to)		if (from && to)
config->callGraphProfile[{from, to}] += cgpe.cgp_weight;		config->callGraphProfile[{from, to}] += cgpe.cgp_weight;
}		}
}		}
}		}

static bool getCompressDebugSections(opt::InputArgList &args) {		static bool getCompressDebugSections(opt::InputArgList &args) {
StringRef s = args.getLastArgValue(OPT_compress_debug_sections, "none");		StringRef s = args.getLastArgValue(OPT_compress_debug_sections, "none");
if (s == "none")		if (s == "none") {
return false;		return false;
if (s != "zlib")		} else if (s == "zlib") {
error("unknown --compress-debug-sections value: " + s);		if (!CompressionKind::Zlib)
if (!compression::zlib::isAvailable())
error("--compress-debug-sections: zlib is not available");		error("--compress-debug-sections: zlib is not available");
		} else {
		error("unknown --compress-debug-sections value: " + s);
		}

return true;		return true;
}		}

static StringRef getAliasSpelling(opt::Arg *arg) {		static StringRef getAliasSpelling(opt::Arg *arg) {
if (const opt::Arg *alias = arg->getAlias())		if (const opt::Arg *alias = arg->getAlias())
return alias->getSpelling();		return alias->getSpelling();
return arg->getSpelling();		return arg->getSpelling();
}		}
▲ Show 20 Lines • Show All 1,849 Lines • Show Last 20 Lines

lld/ELF/InputSection.cpp

Show All 22 Lines
#include <algorithm>		#include <algorithm>
#include <mutex>		#include <mutex>
#include <vector>		#include <vector>

using namespace llvm;		using namespace llvm;
using namespace llvm::ELF;		using namespace llvm::ELF;
using namespace llvm::object;		using namespace llvm::object;
using namespace llvm::support;		using namespace llvm::support;
		using namespace llvm::compression;
using namespace llvm::support::endian;		using namespace llvm::support::endian;
using namespace llvm::sys;		using namespace llvm::sys;
using namespace lld;		using namespace lld;
using namespace lld::elf;		using namespace lld::elf;

SmallVector<InputSectionBase *, 0> elf::inputSections;		SmallVector<InputSectionBase *, 0> elf::inputSections;
SmallVector<EhInputSection *, 0> elf::ehInputSections;		SmallVector<EhInputSection *, 0> elf::ehInputSections;
DenseSet<std::pair<const Symbol *, uint64_t>> elf::ppc64noTocRelax;		DenseSet<std::pair<const Symbol *, uint64_t>> elf::ppc64noTocRelax;
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	void InputSectionBase::uncompress() const {
size_t size = uncompressedSize;		size_t size = uncompressedSize;
uint8_t *uncompressedBuf;		uint8_t *uncompressedBuf;
{		{
static std::mutex mu;		static std::mutex mu;
std::lock_guard<std::mutex> lock(mu);		std::lock_guard<std::mutex> lock(mu);
uncompressedBuf = bAlloc().Allocate<uint8_t>(size);		uncompressedBuf = bAlloc().Allocate<uint8_t>(size);
}		}

if (Error e = compression::zlib::uncompress(rawData, uncompressedBuf, size))		if (Error e =
		CompressionKind::Zlib->decompress(rawData, uncompressedBuf, size))
fatal(toString(this) +		fatal(toString(this) +
": uncompress failed: " + llvm::toString(std::move(e)));		": uncompress failed: " + llvm::toString(std::move(e)));
rawData = makeArrayRef(uncompressedBuf, size);		rawData = makeArrayRef(uncompressedBuf, size);
uncompressedSize = -1;		uncompressedSize = -1;
}		}

template <class ELFT> RelsOrRelas<ELFT> InputSectionBase::relsOrRelas() const {		template <class ELFT> RelsOrRelas<ELFT> InputSectionBase::relsOrRelas() const {
if (relSecIdx == 0)		if (relSecIdx == 0)
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	template <typename ELFT> void InputSectionBase::parseCompressedHeader() {
// New-style header		// New-style header
if (rawData.size() < sizeof(typename ELFT::Chdr)) {		if (rawData.size() < sizeof(typename ELFT::Chdr)) {
error(toString(this) + ": corrupted compressed section");		error(toString(this) + ": corrupted compressed section");
return;		return;
}		}

auto hdr = reinterpret_cast<const typename ELFT::Chdr >(rawData.data());		auto hdr = reinterpret_cast<const typename ELFT::Chdr >(rawData.data());
if (hdr->ch_type == ELFCOMPRESS_ZLIB) {		if (hdr->ch_type == ELFCOMPRESS_ZLIB) {
if (!compression::zlib::isAvailable())		if (!CompressionKind::Zlib)
error(toString(this) + " is compressed with ELFCOMPRESS_ZLIB, but lld is "		error(toString(this) + " is compressed with ELFCOMPRESS_ZLIB, but lld is "
"not built with zlib support");		"not built with zlib support");
} else {		} else {
error(toString(this) + ": unsupported compression type (" +		error(toString(this) + ": unsupported compression type (" +
Twine(hdr->ch_type) + ")");		Twine(hdr->ch_type) + ")");
return;		return;
}		}

▲ Show 20 Lines • Show All 994 Lines • ▼ Show 20 Lines	if (LLVM_UNLIKELY(type == SHT_GROUP)) {
copyShtGroup<ELFT>(buf);		copyShtGroup<ELFT>(buf);
return;		return;
}		}

// If this is a compressed section, uncompress section contents directly		// If this is a compressed section, uncompress section contents directly
// to the buffer.		// to the buffer.
if (uncompressedSize >= 0) {		if (uncompressedSize >= 0) {
size_t size = uncompressedSize;		size_t size = uncompressedSize;
if (Error e = compression::zlib::uncompress(rawData, buf, size))		if (Error e = CompressionKind::Zlib->decompress(rawData, buf, size))
fatal(toString(this) +		fatal(toString(this) +
": uncompress failed: " + llvm::toString(std::move(e)));		": uncompress failed: " + llvm::toString(std::move(e)));
uint8_t *bufEnd = buf + size;		uint8_t *bufEnd = buf + size;
relocate<ELFT>(buf, bufEnd);		relocate<ELFT>(buf, bufEnd);
return;		return;
}		}

// Copy section contents from source object file to output file		// Copy section contents from source object file to output file
// and then apply relocations.		// and then apply relocations.
		leonardchanUnsubmitted Not Done Reply Inline Actions `typename` might not be needed here leonardchan: `typename` might not be needed here
memcpy(buf, rawData.data(), rawData.size());		memcpy(buf, rawData.data(), rawData.size());
relocate<ELFT>(buf, buf + rawData.size());		relocate<ELFT>(buf, buf + rawData.size());
}		}

void InputSection::replace(InputSection *other) {		void InputSection::replace(InputSection *other) {
alignment = std::max(alignment, other->alignment);		alignment = std::max(alignment, other->alignment);

// When a section is replaced with another section that was allocated to		// When a section is replaced with another section that was allocated to
▲ Show 20 Lines • Show All 219 Lines • Show Last 20 Lines

llvm/include/llvm/Object/Decompressor.h

//===-- Decompressor.h ------------------------------------------- C++ --===//		//===-- Decompressor.h ------------------------------------------- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===/		//===----------------------------------------------------------------------===/

#ifndef LLVM_OBJECT_DECOMPRESSOR_H		#ifndef LLVM_OBJECT_DECOMPRESSOR_H
#define LLVM_OBJECT_DECOMPRESSOR_H		#define LLVM_OBJECT_DECOMPRESSOR_H

#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
		#include "llvm/Support/Compression.h"
#include "llvm/Support/Error.h"		#include "llvm/Support/Error.h"

namespace llvm {		namespace llvm {
namespace object {		namespace object {

/// Decompressor helps to handle decompression of compressed sections.		/// Decompressor helps to handle decompression of compressed sections.
class Decompressor {		class Decompressor {
public:		public:
Show All 17 Lines	public:
Error decompress(MutableArrayRef<uint8_t> Buffer);		Error decompress(MutableArrayRef<uint8_t> Buffer);

/// Return memory buffer size required for decompression.		/// Return memory buffer size required for decompression.
uint64_t getDecompressedSize() { return DecompressedSize; }		uint64_t getDecompressedSize() { return DecompressedSize; }

private:		private:
Decompressor(StringRef Data);		Decompressor(StringRef Data);

Error consumeCompressedZLibHeader(bool Is64Bit, bool IsLittleEndian);		Error consumeCompressedSectionHeader(bool Is64Bit, bool IsLittleEndian);

StringRef SectionData;		StringRef SectionData;
uint64_t DecompressedSize;		uint64_t DecompressedSize;
		compression::CompressionKind CompressionScheme =
		compression::CompressionKind::Zlib;
};		};

} // end namespace object		} // end namespace object
} // end namespace llvm		} // end namespace llvm

#endif // LLVM_OBJECT_DECOMPRESSOR_H		#endif // LLVM_OBJECT_DECOMPRESSOR_H

llvm/include/llvm/ProfileData/InstrProf.h

	Show All 20 Lines
	#include "llvm/ADT/StringRef.h"			#include "llvm/ADT/StringRef.h"
	#include "llvm/ADT/StringSet.h"			#include "llvm/ADT/StringSet.h"
	#include "llvm/ADT/Triple.h"			#include "llvm/ADT/Triple.h"
	#include "llvm/IR/GlobalValue.h"			#include "llvm/IR/GlobalValue.h"
	#include "llvm/IR/ProfileSummary.h"			#include "llvm/IR/ProfileSummary.h"
	#include "llvm/ProfileData/InstrProfData.inc"			#include "llvm/ProfileData/InstrProfData.inc"
	#include "llvm/Support/CommandLine.h"			#include "llvm/Support/CommandLine.h"
	#include "llvm/Support/Compiler.h"			#include "llvm/Support/Compiler.h"
				#include "llvm/Support/Compression.h"
	#include "llvm/Support/Endian.h"			#include "llvm/Support/Endian.h"
	#include "llvm/Support/Error.h"			#include "llvm/Support/Error.h"
	#include "llvm/Support/ErrorHandling.h"			#include "llvm/Support/ErrorHandling.h"
	#include "llvm/Support/Host.h"			#include "llvm/Support/Host.h"
	#include "llvm/Support/MD5.h"			#include "llvm/Support/MD5.h"
	#include "llvm/Support/MathExtras.h"			#include "llvm/Support/MathExtras.h"
	#include "llvm/Support/raw_ostream.h"			#include "llvm/Support/raw_ostream.h"
	#include <algorithm>			#include <algorithm>
	▲ Show 20 Lines • Show All 172 Lines • ▼ Show 20 Lines
	/// method generates a combined string \c Result that is ready to be			/// method generates a combined string \c Result that is ready to be
	/// serialized. The \c Result string is comprised of three fields:			/// serialized. The \c Result string is comprised of three fields:
	/// The first field is the length of the uncompressed strings, and the			/// The first field is the length of the uncompressed strings, and the
	/// the second field is the length of the zlib-compressed string.			/// the second field is the length of the zlib-compressed string.
	/// Both fields are encoded in ULEB128. If \c doCompress is false, the			/// Both fields are encoded in ULEB128. If \c doCompress is false, the
	/// third field is the uncompressed strings; otherwise it is the			/// third field is the uncompressed strings; otherwise it is the
	/// compressed string. When the string compression is off, the			/// compressed string. When the string compression is off, the
	/// second field will have value zero.			/// second field will have value zero.
	Error collectPGOFuncNameStrings(ArrayRef<std::string> NameStrs,			Error collectPGOFuncNameStrings(
	bool doCompression, std::string &Result);			ArrayRef<std::string> NameStrs,
				compression::OptionalCompressionKind OptionalCompressionScheme,
				std::string &Result);

	/// Produce \c Result string with the same format described above. The input			/// Produce \c Result string with the same format described above. The input
	/// is vector of PGO function name variables that are referenced.			/// is vector of PGO function name variables that are referenced.
	Error collectPGOFuncNameStrings(ArrayRef<GlobalVariable *> NameVars,			Error collectPGOFuncNameStrings(ArrayRef<GlobalVariable *> NameVars,
	std::string &Result, bool doCompression = true);			std::string &Result, bool doCompression = true);

	/// \c NameStrings is a string composed of one of more sub-strings encoded in			/// \c NameStrings is a string composed of one of more sub-strings encoded in
	/// the format described above. The substrings are separated by 0 or more zero			/// the format described above. The substrings are separated by 0 or more zero
	▲ Show 20 Lines • Show All 978 Lines • Show Last 20 Lines

llvm/include/llvm/Support/Compression.h

	//===-- llvm/Support/Compression.h ---Compression----------------- C++ --===//			//===-- llvm/Support/Compression.h ---Compression----------------- C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This file contains basic functions for compression/uncompression.			// This file contains basic functions for compression/uncompression.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_SUPPORT_COMPRESSION_H			#ifndef LLVM_SUPPORT_COMPRESSION_H
	#define LLVM_SUPPORT_COMPRESSION_H			#define LLVM_SUPPORT_COMPRESSION_H

	#include "llvm/ADT/ArrayRef.h"			#include "llvm/ADT/ArrayRef.h"
				#include "llvm/ADT/Optional.h"
	#include "llvm/Support/DataTypes.h"			#include "llvm/Support/DataTypes.h"
				#include "llvm/Support/Error.h"
				#include "llvm/Support/ErrorHandling.h"

	namespace llvm {			namespace llvm {
	template <typename T> class SmallVectorImpl;			template <typename T> class SmallVectorImpl;
	class Error;			class Error;

	namespace compression {			namespace compression {
	namespace zlib {

	constexpr int NoCompression = 0;			struct CompressionAlgorithm {
	constexpr int BestSpeedCompression = 1;			const StringRef Name;
	constexpr int DefaultCompression = 6;			const int BestSpeedLevel;
	constexpr int BestSizeCompression = 9;			const int DefaultLevel;
				const int BestSizeLevel;
	bool isAvailable();			virtual void compress(ArrayRef<uint8_t> Input,

	void compress(ArrayRef<uint8_t> Input,
	SmallVectorImpl<uint8_t> &CompressedBuffer,			SmallVectorImpl<uint8_t> &CompressedBuffer,
	int Level = DefaultCompression);			int Level) = 0;
				virtual Error decompress(ArrayRef<uint8_t> Input, uint8_t *UncompressedBuffer,
	Error uncompress(ArrayRef<uint8_t> Input, uint8_t *UncompressedBuffer,			size_t &UncompressedSize) = 0;
	size_t &UncompressedSize);

	Error uncompress(ArrayRef<uint8_t> Input,
	SmallVectorImpl<uint8_t> &UncompressedBuffer,
	size_t UncompressedSize);

	} // End of namespace zlib

	namespace zstd {

	constexpr int NoCompression = -5;
	constexpr int BestSpeedCompression = 1;
	constexpr int DefaultCompression = 5;
	constexpr int BestSizeCompression = 12;

	bool isAvailable();

	void compress(ArrayRef<uint8_t> Input,			void compress(ArrayRef<uint8_t> Input,
	SmallVectorImpl<uint8_t> &CompressedBuffer,			SmallVectorImpl<uint8_t> &CompressedBuffer) {
	int Level = DefaultCompression);			return compress(Input, CompressedBuffer, this->DefaultLevel);
				leonardchanUnsubmitted Not Done Reply Inline Actions `this->` might not be needed here leonardchan: `this->` might not be needed here
				}
	Error uncompress(ArrayRef<uint8_t> Input, uint8_t *UncompressedBuffer,
	size_t &UncompressedSize);

	Error uncompress(ArrayRef<uint8_t> Input,			Error decompress(ArrayRef<uint8_t> Input,
	SmallVectorImpl<uint8_t> &UncompressedBuffer,			SmallVectorImpl<uint8_t> &UncompressedBuffer,
	size_t UncompressedSize);			size_t UncompressedSize) {
				UncompressedBuffer.resize_for_overwrite(UncompressedSize);
	} // End of namespace zstd			Error E = decompress(Input, UncompressedBuffer.data(), UncompressedSize);
				if (UncompressedSize < UncompressedBuffer.size())
				UncompressedBuffer.truncate(UncompressedSize);
				return E;
				}
				dblaikieUnsubmitted Not Done Reply Inline Actions Could we skip this wrapper & just have the underlying function (& also we shouldn't be overloading by case like this anyway - please name all the decompress/compress functions with the same case/spelling) dblaikie: Could we skip this wrapper & just have the underlying function (& also we shouldn't be…
				ckissaneAuthorUnsubmitted Done Reply Inline Actions good point ckissane: good point

				protected:
				CompressionAlgorithm(StringRef Name, int BestSpeedLevel, int DefaultLevel,
				int BestSizeLevel)
				: Name(Name), BestSpeedLevel(BestSpeedLevel), DefaultLevel(DefaultLevel),
				BestSizeLevel(BestSizeLevel) {}
				};

				leonardchanUnsubmitted Not Done Reply Inline Actions Does the `uncompress` version of this just end up calling into the other `uncompress` function? If so, we could probably just have one `decompress` virtual method here and the one that accepts a `SmallVectorImpl` just calls into the virtual `decompress` rather than have two virtual methods that would do the same thing. It looks like you've done that in `CompressionAlgorithmImpl`, but I think it could be moved here. leonardchan: Does the `uncompress` version of this just end up calling into the other `uncompress` function?
				class CompressionKind {
				private:
				uint8_t CompressionID;
				leonardchanUnsubmitted Not Done Reply Inline Actions Perhaps add some comments for these functions? At least for me, it's not entirely clear what these are for. leonardchan: Perhaps add some comments for these functions? At least for me, it's not entirely clear what…

				protected:
				friend constexpr llvm::Optional<CompressionKind>
				getOptionalCompressionKind(uint8_t OptionalCompressionID);
				// because getOptionalCompressionKind is the only friend:
				// we can trust the value of y is valid
				leonardchanUnsubmitted Not Done Reply Inline Actions Perhaps it would be simpler to just have the individual subclasses inherit from `CompressionAlgorithm` rather than have them all go through `CompressionAlgorithmImpl`? It looks like each child class with methods like `getAlgorithmId` can just return the static values themselves rather than passing them up to a parent to be returned. I think unless some static polymorphism is needed here, CRTP might not be needed here. leonardchan: Perhaps it would be simpler to just have the individual subclasses inherit from…
				constexpr CompressionKind(uint8_t CompressionID)
				: CompressionID(CompressionID) {}

				public:
				dblaikieUnsubmitted Not Done Reply Inline Actions Rather than `supported()` maybe the accessor functions could return nullptr when support isn't available? if (CompressionAlgorithm A = getZstdCompressionScheme()) etc. Though I guess that doesn't allow for a default implementation - I guess an alternative function could be `CompressionAlgorithm& getCompressionSchemeOrNone(Zstd)` which always gives a valid `CompressionAlgorithm` by giving the do-nothing compression algorithm when the specified one is not available. But I guess we don't generally want to silently fallback to null compression, because the streams we're producing always need to know if they have to emit headers, etc, or not? So maybe there's no need for a default? dblaikie:* Rather than `supported()` maybe the accessor functions could return nullptr when support isn't…
				constexpr operator uint8_t() const { return CompressionID; }
				CompressionAlgorithm *operator->() const;

				constexpr operator bool() const;

				static const llvm::compression::CompressionKind Unknown, Zlib, ZStd;
				};
				constexpr inline const llvm::compression::CompressionKind
				llvm::compression::CompressionKind::Unknown{255}, ///< Abstract compression
				llvm::compression::CompressionKind::Zlib{1}, ///< zlib style complession
				llvm::compression::CompressionKind::ZStd{2}; ///< zstd style complession
				typedef llvm::Optional<CompressionKind> OptionalCompressionKind;

				constexpr CompressionKind::operator bool() const {
				leonardchanUnsubmitted Not Done Reply Inline Actions I think this cast might not be needed leonardchan: I think this cast might not be needed
				switch (uint8_t(CompressionID)) {
				case uint8_t(CompressionKind::Zlib):
				return LLVM_ENABLE_ZLIB;
				case uint8_t(CompressionKind::ZStd):
				return LLVM_ENABLE_ZSTD;
				default:
				return false;
				}
				}

				constexpr bool operator==(CompressionKind Left, CompressionKind Right) {
				return uint8_t(Left) == uint8_t(Right);
				}

				constexpr OptionalCompressionKind
				getOptionalCompressionKind(uint8_t OptionalCompressionID) {
				switch (OptionalCompressionID) {
				case uint8_t(0):
				return NoneType();
				case uint8_t(CompressionKind::Zlib):
				leonardchanUnsubmitted Not Done Reply Inline Actions I think this cast might not be needed leonardchan: I think this cast might not be needed
				case uint8_t(CompressionKind::ZStd):
				return CompressionKind(OptionalCompressionID);
				default:
				return CompressionKind::Unknown;
				}
				}
				leonardchanUnsubmitted Not Done Reply Inline Actions https://llvm.org/docs/CodingStandards.html#use-of-class-and-struct-keywords since the ctor is non-public leonardchan: https://llvm.org/docs/CodingStandards.html#use-of-class-and-struct-keywords since the ctor is…

	} // End of namespace compression			} // End of namespace compression

	} // End of namespace llvm			} // End of namespace llvm

	#endif			#endif

llvm/lib/MC/ELFObjectWriter.cpp

	Show First 20 Lines • Show All 841 Lines • ▼ Show 20 Lines

	void ELFWriter::writeSectionData(const MCAssembler &Asm, MCSection &Sec,			void ELFWriter::writeSectionData(const MCAssembler &Asm, MCSection &Sec,
	const MCAsmLayout &Layout) {			const MCAsmLayout &Layout) {
	MCSectionELF &Section = static_cast<MCSectionELF &>(Sec);			MCSectionELF &Section = static_cast<MCSectionELF &>(Sec);
	StringRef SectionName = Section.getName();			StringRef SectionName = Section.getName();

	auto &MC = Asm.getContext();			auto &MC = Asm.getContext();
	const auto &MAI = MC.getAsmInfo();			const auto &MAI = MC.getAsmInfo();
				const DebugCompressionType CompressionType = MAI->compressDebugSections();
	bool CompressionEnabled =			bool CompressionEnabled = CompressionType != DebugCompressionType::None;
	MAI->compressDebugSections() != DebugCompressionType::None;
	if (!CompressionEnabled \|\| !SectionName.startswith(".debug_")) {			if (!CompressionEnabled \|\| !SectionName.startswith(".debug_")) {
	Asm.writeSectionData(W.OS, &Section, Layout);			Asm.writeSectionData(W.OS, &Section, Layout);
	return;			return;
	}			}

	assert(MAI->compressDebugSections() == DebugCompressionType::Z &&			assert(CompressionType == DebugCompressionType::Z &&
	"expected zlib style compression");			"expected zlib style compression");

	SmallVector<char, 128> UncompressedData;			SmallVector<char, 128> UncompressedData;
	raw_svector_ostream VecOS(UncompressedData);			raw_svector_ostream VecOS(UncompressedData);
	Asm.writeSectionData(VecOS, &Section, Layout);			Asm.writeSectionData(VecOS, &Section, Layout);

	SmallVector<uint8_t, 128> Compressed;			SmallVector<uint8_t, 128> Compressed;
	const uint32_t ChType = ELF::ELFCOMPRESS_ZLIB;			const uint32_t ChType = ELF::ELFCOMPRESS_ZLIB;
	compression::zlib::compress(			compression::CompressionKind::Zlib->compress(
	makeArrayRef(reinterpret_cast<uint8_t *>(UncompressedData.data()),			makeArrayRef(reinterpret_cast<uint8_t *>(UncompressedData.data()),
	UncompressedData.size()),			UncompressedData.size()),
	Compressed);			Compressed);

	if (!maybeWriteCompression(ChType, UncompressedData.size(), Compressed,			if (!maybeWriteCompression(ChType, UncompressedData.size(), Compressed,
	Sec.getAlignment())) {			Sec.getAlignment())) {
	W.OS << UncompressedData;			W.OS << UncompressedData;
	return;			return;
	▲ Show 20 Lines • Show All 635 Lines • Show Last 20 Lines

llvm/lib/ObjCopy/ELF/ELFObject.cpp

Show First 20 Lines • Show All 433 Lines • ▼ Show 20 Lines	Error SectionWriter::visit(const OwnedDataSection &Sec) {
return Error::success();		return Error::success();
}		}

template <class ELFT>		template <class ELFT>
Error ELFSectionWriter<ELFT>::visit(const DecompressedSection &Sec) {		Error ELFSectionWriter<ELFT>::visit(const DecompressedSection &Sec) {
ArrayRef<uint8_t> Compressed =		ArrayRef<uint8_t> Compressed =
Sec.OriginalData.slice(sizeof(Elf_Chdr_Impl<ELFT>));		Sec.OriginalData.slice(sizeof(Elf_Chdr_Impl<ELFT>));
SmallVector<uint8_t, 128> DecompressedContent;		SmallVector<uint8_t, 128> DecompressedContent;
if (Error Err = compression::zlib::uncompress(Compressed, DecompressedContent,		DebugCompressionType CompressionType =
static_cast<size_t>(Sec.Size)))		reinterpret_cast<const Elf_Chdr_Impl<ELFT> *>(Sec.OriginalData.data())
		->ch_type == ELF::ELFCOMPRESS_ZLIB
		? DebugCompressionType::Z
		: DebugCompressionType::None;

		switch (CompressionType) {
		case DebugCompressionType::Z:
		if (Error Err1 = compression::CompressionKind::Zlib->decompress(
		Compressed, DecompressedContent, static_cast<size_t>(Sec.Size))) {
return createStringError(errc::invalid_argument,		return createStringError(errc::invalid_argument,
"'" + Sec.Name + "': " + toString(std::move(Err)));		"'" + Sec.Name +
		"': " + toString(std::move(Err1)));
		}
		break;
		case DebugCompressionType::None:
		llvm_unreachable("unexpected DebugCompressionType::None");
		break;
		}

		leonardchanUnsubmitted Not Done Reply Inline Actions What's the explanation for having the `llvm_unreachable` branch and getting the compression type? I would've thought this section would just be: if (Error Err1 = compression::CompressionKind::Zlib->decompress( Compressed, DecompressedContent, static_cast<size_t>(Sec.Size))) { return createStringError(errc::invalid_argument, "'" + Sec.Name + "': " + toString(std::move(Err1))); } which looks like it would have identical behavior to the old code. leonardchan: What's the explanation for having the `llvm_unreachable` branch and getting the compression…
uint8_t Buf = reinterpret_cast<uint8_t >(Out.getBufferStart()) + Sec.Offset;		uint8_t Buf = reinterpret_cast<uint8_t >(Out.getBufferStart()) + Sec.Offset;
std::copy(DecompressedContent.begin(), DecompressedContent.end(), Buf);		std::copy(DecompressedContent.begin(), DecompressedContent.end(), Buf);

return Error::success();		return Error::success();
}		}

Error BinarySectionWriter::visit(const DecompressedSection &Sec) {		Error BinarySectionWriter::visit(const DecompressedSection &Sec) {
return createStringError(errc::operation_not_permitted,		return createStringError(errc::operation_not_permitted,
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	Error ELFSectionWriter<ELFT>::visit(const CompressedSection &Sec) {
std::copy(Sec.CompressedData.begin(), Sec.CompressedData.end(), Buf);		std::copy(Sec.CompressedData.begin(), Sec.CompressedData.end(), Buf);
return Error::success();		return Error::success();
}		}

CompressedSection::CompressedSection(const SectionBase &Sec,		CompressedSection::CompressedSection(const SectionBase &Sec,
DebugCompressionType CompressionType)		DebugCompressionType CompressionType)
: SectionBase(Sec), CompressionType(CompressionType),		: SectionBase(Sec), CompressionType(CompressionType),
DecompressedSize(Sec.OriginalData.size()), DecompressedAlign(Sec.Align) {		DecompressedSize(Sec.OriginalData.size()), DecompressedAlign(Sec.Align) {
compression::zlib::compress(OriginalData, CompressedData);		switch (CompressionType) {
		case DebugCompressionType::Z:
		compression::CompressionKind::Zlib->compress(OriginalData, CompressedData);
		break;
		case DebugCompressionType::None:
		break;
		}

		leonardchanUnsubmitted Not Done Reply Inline Actions Same here. Should this just be `compression::CompressionKind::Zlib->compress(OriginalData, CompressedData);`? If this is in preparation for the ELF+zstd changes, perhaps we should save those for another patch once that lands? leonardchan: Same here. Should this just be `compression::CompressionKind::Zlib->compress(OriginalData…
assert(CompressionType != DebugCompressionType::None);		assert(CompressionType != DebugCompressionType::None);
Flags \|= ELF::SHF_COMPRESSED;		Flags \|= ELF::SHF_COMPRESSED;
size_t ChdrSize =		size_t ChdrSize =
std::max(std::max(sizeof(object::Elf_Chdr_Impl<object::ELF64LE>),		std::max(std::max(sizeof(object::Elf_Chdr_Impl<object::ELF64LE>),
sizeof(object::Elf_Chdr_Impl<object::ELF64BE>)),		sizeof(object::Elf_Chdr_Impl<object::ELF64BE>)),
std::max(sizeof(object::Elf_Chdr_Impl<object::ELF32LE>),		std::max(sizeof(object::Elf_Chdr_Impl<object::ELF32LE>),
sizeof(object::Elf_Chdr_Impl<object::ELF32BE>)));		sizeof(object::Elf_Chdr_Impl<object::ELF32BE>)));
Size = ChdrSize + CompressedData.size();		Size = ChdrSize + CompressedData.size();
▲ Show 20 Lines • Show All 2,236 Lines • Show Last 20 Lines

llvm/lib/Object/Decompressor.cpp

Show All 13 Lines

#include "llvm/Support/Endian.h" #include "llvm/Support/Endian.h"

using namespace llvm; using namespace llvm;

using namespace llvm::support::endian; using namespace llvm::support::endian;

using namespace object; using namespace object;

Expected<Decompressor> Decompressor::create(StringRef Name, StringRef Data, Expected<Decompressor> Decompressor::create(StringRef Name, StringRef Data,

bool IsLE, bool Is64Bit) { bool IsLE, bool Is64Bit) {

if (!compression::zlib::isAvailable())

return createError("zlib is not available");

Decompressor D(Data); Decompressor D(Data);

if (Error Err = D.consumeCompressedZLibHeader(Is64Bit, IsLE)) if (Error Err = D.consumeCompressedSectionHeader(Is64Bit, IsLE))

return std::move(Err); return std::move(Err);

return D; return D;

} }

Decompressor::Decompressor(StringRef Data) Decompressor::Decompressor(StringRef Data)

: SectionData(Data), DecompressedSize(0) {} : SectionData(Data), DecompressedSize(0) {}

Error Decompressor::consumeCompressedZLibHeader(bool Is64Bit, Error Decompressor::consumeCompressedSectionHeader(bool Is64Bit,

bool IsLittleEndian) { bool IsLittleEndian) {

using namespace ELF; using namespace ELF;

uint64_t HdrSize = Is64Bit ? sizeof(Elf64_Chdr) : sizeof(Elf32_Chdr); uint64_t HdrSize = Is64Bit ? sizeof(Elf64_Chdr) : sizeof(Elf32_Chdr);

if (SectionData.size() < HdrSize) if (SectionData.size() < HdrSize)

return createError("corrupted compressed section header"); return createError("corrupted compressed section header");

DataExtractor Extractor(SectionData, IsLittleEndian, 0); DataExtractor Extractor(SectionData, IsLittleEndian, 0);

uint64_t Offset = 0; uint64_t Offset = 0;

if (Extractor.getUnsigned(&Offset, Is64Bit ? sizeof(Elf64_Word) uint64_t ELFCompressionSchemeId = Extractor.getUnsigned(

: sizeof(Elf32_Word)) != &Offset, Is64Bit ? sizeof(Elf64_Word) : sizeof(Elf32_Word));

ELFCOMPRESS_ZLIB) if (ELFCompressionSchemeId == ELFCOMPRESS_ZLIB) {

CompressionScheme = compression::CompressionKind::Zlib;

} else {

return createError("unsupported compression type"); return createError("unsupported compression type");

}

if (!CompressionScheme)

return createError(CompressionScheme->Name + " is not available");

dblaikieUnsubmitted

Not Done

uint64_t Offset = 0;

- uint64_t ELFCompressionSchemeId = Extractor.getUnsigned(

- &Offset, Is64Bit ? sizeof(Elf64_Word) : sizeof(Elf32_Word));

- if (ELFCompressionSchemeId == ELFCOMPRESS_ZLIB) {

- CompressionScheme = compression::CompressionKind::Zlib;

- } else {

+ if (Extractor.getUnsigned(&Offset, Is64Bit ? sizeof(Elf64_Word)

+ : sizeof(Elf32_Word)) !=

+ ELFCOMPRESS_ZLIB)

return createError("unsupported compression type");

- }

+ CompressionScheme = compression::CompressionKind::Zlib;

if (!CompressionScheme)

+ return createError(CompressionScheme->Name + " is not available"); if (!CompressionScheme)

Maybe leave this code more like it was before - it can turn into a switch over ELFCompressionSchemeId when Zstd is added here.

dblaikie: Maybe leave this code more like it was before - it can turn into a switch over…

// Skip Elf64_Chdr::ch_reserved field. // Skip Elf64_Chdr::ch_reserved field.

if (Is64Bit) if (Is64Bit)

Offset += sizeof(Elf64_Word); Offset += sizeof(Elf64_Word);

DecompressedSize = Extractor.getUnsigned( DecompressedSize = Extractor.getUnsigned(

&Offset, Is64Bit ? sizeof(Elf64_Xword) : sizeof(Elf32_Word)); &Offset, Is64Bit ? sizeof(Elf64_Xword) : sizeof(Elf32_Word));

SectionData = SectionData.substr(HdrSize); SectionData = SectionData.substr(HdrSize);

return Error::success(); return Error::success();

} }

Error Decompressor::decompress(MutableArrayRef<uint8_t> Buffer) { Error Decompressor::decompress(MutableArrayRef<uint8_t> Buffer) {

size_t Size = Buffer.size(); size_t Size = Buffer.size();

return compression::zlib::uncompress(arrayRefFromStringRef(SectionData), return CompressionScheme->decompress(arrayRefFromStringRef(SectionData),

Buffer.data(), Size); Buffer.data(), Size);

} }

llvm/lib/ProfileData/Coverage/CoverageMappingReader.cpp

Show All 33 Lines

#include "llvm/Support/ErrorHandling.h"

#include "llvm/Support/LEB128.h"

#include "llvm/Support/MathExtras.h"

#include "llvm/Support/Path.h"

#include "llvm/Support/raw_ostream.h"

#include <vector>

using namespace llvm;

using namespace llvm::compression;

using namespace coverage;

using namespace object;

#define DEBUG_TYPE "coverage-mapping"

STATISTIC(CovMapNumRecords, "The # of coverage function records");

STATISTIC(CovMapNumUsedRecords, "The # of used coverage function records");

▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines

Error RawCoverageFilenamesReader::read(CovMapVersion Version) {

uint64_t UncompressedLen;

if (auto Err = readULEB128(UncompressedLen))

return Err;

uint64_t CompressedLen;

if (auto Err = readSize(CompressedLen))

return Err;

if (CompressedLen > 0) {

if (!compression::zlib::isAvailable())

if (CompressionKind CompressionScheme = CompressionKind::Zlib) {

dblaikieUnsubmitted

Not Done

if (CompressedLen > 0) {

- if (!compression::CompressionKind::Zlib)

+ CompressionKind Compress = compression::CompressionKind::Zlib;

+ if (!Compress)

return make_error<CoverageMapError>(

coveragemap_error::decompression_failed);

// Allocate memory for the decompressed filenames.

SmallVector<uint8_t, 0> StorageBuf;

// Read compressed filenames.

StringRef CompressedFilenames = Data.substr(0, CompressedLen);

Data = Data.substr(CompressedLen);

- auto Err = compression::CompressionKind::Zlib->decompress(

+ auto Err = Compress->decompress(

arrayRefFromStringRef(CompressedFilenames), StorageBuf,

Be nice to share the same CompressionKind

dblaikie: Be nice to share the same CompressionKind

return make_error<CoverageMapError>(

coveragemap_error::decompression_failed);

// Allocate memory for the decompressed filenames.

SmallVector<uint8_t, 0> StorageBuf;

// Read compressed filenames.

StringRef CompressedFilenames = Data.substr(0, CompressedLen);

Data = Data.substr(CompressedLen);

auto Err = compression::zlib::uncompress(

auto Err = CompressionScheme->decompress(

arrayRefFromStringRef(CompressedFilenames), StorageBuf,

UncompressedLen);

if (Err) {

consumeError(std::move(Err));

return make_error<CoverageMapError>(

coveragemap_error::decompression_failed);

}

RawCoverageFilenamesReader Delegate(toStringRef(StorageBuf), Filenames,

CompilationDir);

return Delegate.readUncompressed(Version, NumFilenames);

}

return make_error<CoverageMapError>(

coveragemap_error::decompression_failed);

}

return readUncompressed(Version, NumFilenames);

}

Error RawCoverageFilenamesReader::readUncompressed(CovMapVersion Version,

uint64_t NumFilenames) {

// Read uncompressed filenames.

if (Version < CovMapVersion::Version6) {

▲ Show 20 Lines • Show All 1,026 Lines • Show Last 20 Lines

llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp

//===- CoverageMappingWriter.cpp - Code coverage mapping writer -----------===// //===- CoverageMappingWriter.cpp - Code coverage mapping writer -----------===//

// //

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information. // See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

// //

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

// //

// This file contains support for writing coverage mapping data for // This file contains support for writing coverage mapping data for

// instrumentation based coverage. // instrumentation based coverage.

// //

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

#include "llvm/ProfileData/InstrProf.h"

#include "llvm/ProfileData/Coverage/CoverageMappingWriter.h" #include "llvm/ProfileData/Coverage/CoverageMappingWriter.h"

#include "llvm/ADT/ArrayRef.h" #include "llvm/ADT/ArrayRef.h"

#include "llvm/ADT/Optional.h"

#include "llvm/ADT/SmallVector.h" #include "llvm/ADT/SmallVector.h"

#include "llvm/ProfileData/InstrProf.h"

#include "llvm/Support/Compression.h" #include "llvm/Support/Compression.h"

#include "llvm/Support/LEB128.h" #include "llvm/Support/LEB128.h"

#include "llvm/Support/raw_ostream.h" #include "llvm/Support/raw_ostream.h"

#include <algorithm> #include <algorithm>

#include <cassert> #include <cassert>

#include <limits> #include <limits>

#include <vector> #include <vector>

using namespace llvm; using namespace llvm;

using namespace llvm::compression;

using namespace coverage; using namespace coverage;

CoverageFilenamesSectionWriter::CoverageFilenamesSectionWriter( CoverageFilenamesSectionWriter::CoverageFilenamesSectionWriter(

ArrayRef<std::string> Filenames) ArrayRef<std::string> Filenames)

: Filenames(Filenames) { : Filenames(Filenames) {

#ifndef NDEBUG #ifndef NDEBUG

StringSet<> NameSet; StringSet<> NameSet;

for (StringRef Name : Filenames) for (StringRef Name : Filenames)

assert(NameSet.insert(Name).second && "Duplicate filename"); assert(NameSet.insert(Name).second && "Duplicate filename");

#endif #endif

} }

void CoverageFilenamesSectionWriter::write(raw_ostream &OS, bool Compress) { void CoverageFilenamesSectionWriter::write(raw_ostream &OS, bool Compress) {

std::string FilenamesStr; std::string FilenamesStr;

{ {

raw_string_ostream FilenamesOS{FilenamesStr}; raw_string_ostream FilenamesOS{FilenamesStr};

for (const auto &Filename : Filenames) { for (const auto &Filename : Filenames) {

encodeULEB128(Filename.size(), FilenamesOS); encodeULEB128(Filename.size(), FilenamesOS);

FilenamesOS << Filename; FilenamesOS << Filename;

} }

SmallVector<uint8_t, 128> CompressedStr; SmallVector<uint8_t, 128> CompressedStr;

bool doCompression = Compress && compression::zlib::isAvailable() && CompressionKind CompressionScheme = CompressionKind::Zlib;

dblaikieUnsubmitted

Not Done

This seems a bit too convoluted for me.

I'd think something like:

if (DoInstrProfNameCompression) {
  if (CompressionAlgorithm *C = getZlibCompressionAlgorithm())
    C->compress(...);
}

Or even have getCompressionAlgorithm(SupportCompressionType::Zlib) (like that could be the only entry point - no need for algorithm-specific accessors, that function would have one switch over SupportCompressionType, returning null if Unknown or Null were passed, or if the requested algorithm was not available)

I'm not sure I understand the 'when'/'whenSupported' stuff and whether there's any value/need for more details to be communicated in the not-available case other than 'false'/null/nothing (like, if it needs to communicate a reason for non-availability, that's more involved than returning null from some factory/accessor function).

dblaikie: This seems a bit too convoluted for me. I'd think something like: ``` if…

DoInstrProfNameCompression; bool DoCompression =

if (doCompression) Compress && DoInstrProfNameCompression && CompressionScheme;

compression::zlib::compress(arrayRefFromStringRef(FilenamesStr), if (DoCompression) {

ckissaneAuthorUnsubmitted

Done

note the helpers such as when(bool), whensupported() and notNone()

ckissane: note the helpers such as when(bool), whensupported() and notNone()

CompressionScheme->compress(arrayRefFromStringRef(FilenamesStr),

CompressedStr, CompressedStr,

dblaikieUnsubmitted

Not Done

SmallVector<uint8_t, 128> CompressedStr;

- compression::OptionalCompressionKind OptionalCompressionScheme =

- compression::CompressionKind::Zlib;

+ compression::CompressionKind CompressionScheme =

+ compression::CompressionKind::Zlib

+ bool doCompression = Compress && DoInstrProfNameCompression && CompressionScheme;

+ if (doCompression) {

+ CompressionScheme->compress(arrayRefFromStringRef(FilenamesStr),

- bool DoCompression = OptionalCompressionScheme && *OptionalCompressionScheme;

- if (DoCompression) {

- compression::CompressionKind CompressionScheme = *OptionalCompressionScheme;

- CompressionScheme->compress(arrayRefFromStringRef(FilenamesStr),

- CompressedStr,

+ compression::CompressionKind CompressionScheme =

+ compression::CompressionKind::Zlib

+ bool doCompression = Compress && DoInstrProfNameCompression && CompressionScheme;

+ if (doCompression) {

+ CompressionScheme->compress(arrayRefFromStringRef(FilenamesStr), CompressedStr,

Why/how does OptionalCompressionKind end up in here, compared with this (suggested edit)?

dblaikie: Why/how does `OptionalCompressionKind` end up in here, compared with this (suggested edit)?

compression::zlib::BestSizeCompression); CompressionScheme->BestSizeLevel);

}

// ::= <num-filenames> // ::= <num-filenames>

dblaikieUnsubmitted

Not Done

This still seems like a lot of hoops to jump through - why "noneIfUnsupported" rather than either having the compression scheme (I think it could be the CompressionAlgorithm itself, rather than having the separate OptionalCompressionKind abstraction) either be null itself, or expose an "isAvailable" operation directly on the CompressionAlgorithm?

Even if the CompressionKind/OptionalCompressionKind/CompressionAlgorithm abstractions are kept, I'm not sure why the above code is preferred over, say:

if (Compress && DoInstrProfNameCompression && OptionalCompressionScheme /* .isAvailable(), if we want to be more explicit */) {
  ...
}

What's the benefit that noneIfUnsupported is providing? (& generally I'd expect the Compress && DoInstrProfNameCompression to be tested/exit early before even naming/constructing/querying/doing anything with the compression scheme/algorithm/etc - so there'd be no need to combine the tests for availability and the tests for whether compression was requested)

Perhaps this API is motivated by a desire to implement something much closer to the original code than is necessary/suitable? Or some other use case/benefit I'm not quite understanding yet?

dblaikie: This still seems like a lot of hoops to jump through - why "noneIfUnsupported" rather than…

ckissaneAuthorUnsubmitted

Done

I shall remove noneIfUnsupported. You express good points, we can simply check if(OptionalCompressionScheme && *OptionalCompressionScheme) where necessary.

ckissane: I shall remove `noneIfUnsupported`. You express good points, we can simply check `if…

ckissaneAuthorUnsubmitted

Done

though it will make a lot of existing code patterns less clear, and more verbose

ckissane: though it will make a lot of existing code patterns less clear, and more verbose

ckissaneAuthorUnsubmitted

Done

and sometimes you really do need to re code the exact thing noneIfUnsupported encapsulates...

ckissane: and sometimes you really do need to re code the exact thing `noneIfUnsupported` encapsulates...

dblaikieUnsubmitted

Not Done

Are there examples within LLVM that you can show compare/contrast noneIfUnsupported helps?

dblaikie: Are there examples within LLVM that you can show compare/contrast `noneIfUnsupported` helps?

ckissaneAuthorUnsubmitted

Done

yes, I'll paste a couple here

ckissane: yes, I'll paste a couple here

ckissaneAuthorUnsubmitted

Done

Ok, So I believe I was mistaken.
In older versions of this patch there was a none compression implementation that just did a memcpy, this made a natural fall through state, which made this type of pattern advantageous.
However, this is no longer the case.
Hence I will remove this without further adue.
Thank you for your astute observation!

ckissane: Ok, So I believe I was mistaken. In older versions of this patch there was a none compression…

// <uncompressed-len> // <uncompressed-len>

// <compressed-len-or-zero> // <compressed-len-or-zero>

// (<compressed-filenames> | <uncompressed-filenames>) // (<compressed-filenames> | <uncompressed-filenames>)

encodeULEB128(Filenames.size(), OS); encodeULEB128(Filenames.size(), OS);

encodeULEB128(FilenamesStr.size(), OS); encodeULEB128(FilenamesStr.size(), OS);

encodeULEB128(doCompression ? CompressedStr.size() : 0U, OS); encodeULEB128(DoCompression ? CompressedStr.size() : 0U, OS);

OS << (doCompression ? toStringRef(CompressedStr) : StringRef(FilenamesStr)); OS << (DoCompression ? toStringRef(CompressedStr) : StringRef(FilenamesStr));

} }

namespace { namespace {

/// Gather only the expressions that are used by the mapping /// Gather only the expressions that are used by the mapping

/// regions in this function. /// regions in this function.

class CounterExpressionsMinimizer { class CounterExpressionsMinimizer {

ArrayRef<CounterExpression> Expressions; ArrayRef<CounterExpression> Expressions;

▲ Show 20 Lines • Show All 178 Lines • Show Last 20 Lines

llvm/lib/ProfileData/InstrProf.cpp

Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines

#include <memory> #include <memory>

#include <string> #include <string>

#include <system_error> #include <system_error>

#include <type_traits> #include <type_traits>

#include <utility> #include <utility>

#include <vector> #include <vector>

using namespace llvm; using namespace llvm;

using namespace llvm::compression;

static cl::opt<bool> StaticFuncFullModulePrefix( static cl::opt<bool> StaticFuncFullModulePrefix(

"static-func-full-module-prefix", cl::init(true), cl::Hidden, "static-func-full-module-prefix", cl::init(true), cl::Hidden,

cl::desc("Use full module build paths in the profile counter names for " cl::desc("Use full module build paths in the profile counter names for "

"static functions.")); "static functions."));

// This option is tailored to users that have different top-level directory in // This option is tailored to users that have different top-level directory in

// profile-gen and profile-use compilation. Users need to specific the number // profile-gen and profile-use compilation. Users need to specific the number

▲ Show 20 Lines • Show All 364 Lines • ▼ Show 20 Lines uint64_t InstrProfSymtab::getFunctionHashFromAddress(uint64_t Address) {

// external functions that are not instrumented. They won't have // external functions that are not instrumented. They won't have

// mapping data to be used by the deserializer. Force the value to // mapping data to be used by the deserializer. Force the value to

// be 0 in this case. // be 0 in this case.

if (It != AddrToMD5Map.end() && It->first == Address) if (It != AddrToMD5Map.end() && It->first == Address)

return (uint64_t)It->second; return (uint64_t)It->second;

return 0; return 0;

} }

Error collectPGOFuncNameStrings(ArrayRef<std::string> NameStrs, Error collectPGOFuncNameStrings(

bool doCompression, std::string &Result) { ArrayRef<std::string> NameStrs,

OptionalCompressionKind OptionalCompressionScheme, std::string &Result) {

assert(!NameStrs.empty() && "No name data to emit"); assert(!NameStrs.empty() && "No name data to emit");

uint8_t Header[16], *P = Header; uint8_t Header[16], *P = Header;

std::string UncompressedNameStrings = std::string UncompressedNameStrings =

join(NameStrs.begin(), NameStrs.end(), getInstrProfNameSeparator()); join(NameStrs.begin(), NameStrs.end(), getInstrProfNameSeparator());

assert(StringRef(UncompressedNameStrings) assert(StringRef(UncompressedNameStrings)

.count(getInstrProfNameSeparator()) == (NameStrs.size() - 1) && .count(getInstrProfNameSeparator()) == (NameStrs.size() - 1) &&

"PGO name is invalid (contains separator token)"); "PGO name is invalid (contains separator token)");

unsigned EncLen = encodeULEB128(UncompressedNameStrings.length(), P); unsigned EncLen = encodeULEB128(UncompressedNameStrings.length(), P);

P += EncLen; P += EncLen;

auto WriteStringToResult = [&](size_t CompressedLen, StringRef InputStr) { auto WriteStringToResult = [&](size_t CompressedLen, StringRef InputStr) {

EncLen = encodeULEB128(CompressedLen, P); EncLen = encodeULEB128(CompressedLen, P);

P += EncLen; P += EncLen;

char *HeaderStr = reinterpret_cast<char *>(&Header[0]); char *HeaderStr = reinterpret_cast<char *>(&Header[0]);

unsigned HeaderLen = P - &Header[0]; unsigned HeaderLen = P - &Header[0];

Result.append(HeaderStr, HeaderLen); Result.append(HeaderStr, HeaderLen);

Result += InputStr; Result += InputStr;

return Error::success(); return Error::success();

}; };

if (!doCompression) { if ((!OptionalCompressionScheme) || (!(*OptionalCompressionScheme)))

dblaikieUnsubmitted

Not Done

Yeah, this seems awkward, but I see what you're getting at - if you're going to pass around an algorithm to use, and this particular kind of use case wants to collapse the "algorithm not available" and "no compression was requested" - so, yeah, for this sort of use case I can see the valid in having a null compression algorithm implementation. That shouldn't add a lot of complexity to the implementation and simplify the usage so it doesn't have two layers of "not present" state.

dblaikie: Yeah, this seems awkward, but I see what you're getting at - if you're going to pass around an…

return WriteStringToResult(0, UncompressedNameStrings); return WriteStringToResult(0, UncompressedNameStrings);

} CompressionKind CompressionScheme = *OptionalCompressionScheme;

SmallVector<uint8_t, 128> CompressedNameStrings; SmallVector<uint8_t, 128> CompressedNameStrings;

compression::zlib::compress(arrayRefFromStringRef(UncompressedNameStrings), CompressionScheme->compress(arrayRefFromStringRef(UncompressedNameStrings),

CompressedNameStrings, CompressedNameStrings,

compression::zlib::BestSizeCompression); CompressionScheme->BestSizeLevel);

return WriteStringToResult(CompressedNameStrings.size(), return WriteStringToResult(CompressedNameStrings.size(),

toStringRef(CompressedNameStrings)); toStringRef(CompressedNameStrings));

} }

StringRef getPGOFuncNameVarInitializer(GlobalVariable *NameVar) { StringRef getPGOFuncNameVarInitializer(GlobalVariable *NameVar) {

auto *Arr = cast<ConstantDataArray>(NameVar->getInitializer()); auto *Arr = cast<ConstantDataArray>(NameVar->getInitializer());

StringRef NameStr = StringRef NameStr =

Arr->isCString() ? Arr->getAsCString() : Arr->getAsString(); Arr->isCString() ? Arr->getAsCString() : Arr->getAsString();

return NameStr; return NameStr;

} }

Error collectPGOFuncNameStrings(ArrayRef<GlobalVariable *> NameVars, Error collectPGOFuncNameStrings(ArrayRef<GlobalVariable *> NameVars,

std::string &Result, bool doCompression) { std::string &Result, bool doCompression) {

std::vector<std::string> NameStrs; std::vector<std::string> NameStrs;

for (auto *NameVar : NameVars) { for (auto *NameVar : NameVars) {

NameStrs.push_back(std::string(getPGOFuncNameVarInitializer(NameVar))); NameStrs.push_back(std::string(getPGOFuncNameVarInitializer(NameVar)));

} }

OptionalCompressionKind OptionalCompressionScheme = CompressionKind::Zlib;

return collectPGOFuncNameStrings( return collectPGOFuncNameStrings(

NameStrs, compression::zlib::isAvailable() && doCompression, Result); NameStrs, doCompression ? OptionalCompressionScheme : llvm::NoneType(),

Result);

} }

dblaikieUnsubmitted

Not Done

return collectPGOFuncNameStrings(NameStrs,

- doCompression && OptionalCompressionScheme

+ doCompression

? OptionalCompressionScheme

Presumably this doesn't need to test OptionalCompressionScheme here, though, since it's tested in the implementation?

dblaikie: Presumably this doesn't need to test `OptionalCompressionScheme` here, though, since it's…

Error readPGOFuncNameStrings(StringRef NameStrings, InstrProfSymtab &Symtab) { Error readPGOFuncNameStrings(StringRef NameStrings, InstrProfSymtab &Symtab) {

const uint8_t *P = NameStrings.bytes_begin(); const uint8_t *P = NameStrings.bytes_begin();

const uint8_t *EndP = NameStrings.bytes_end(); const uint8_t *EndP = NameStrings.bytes_end();

while (P < EndP) { while (P < EndP) {

uint32_t N; uint32_t N;

uint64_t UncompressedSize = decodeULEB128(P, &N); uint64_t UncompressedSize = decodeULEB128(P, &N);

P += N; P += N;

uint64_t CompressedSize = decodeULEB128(P, &N); uint64_t CompressedSize = decodeULEB128(P, &N);

P += N; P += N;

bool isCompressed = (CompressedSize != 0); bool isCompressed = (CompressedSize != 0);

SmallVector<uint8_t, 128> UncompressedNameStrings; SmallVector<uint8_t, 128> UncompressedNameStrings;

StringRef NameStrings; StringRef NameStrings;

if (isCompressed) { if (isCompressed) {

if (!llvm::compression::zlib::isAvailable()) CompressionKind CompressionScheme = CompressionKind::Zlib;

if (!CompressionScheme)

return make_error<InstrProfError>(instrprof_error::zlib_unavailable); return make_error<InstrProfError>(instrprof_error::zlib_unavailable);

if (Error E = compression::zlib::uncompress( if (Error E = CompressionScheme->decompress(

makeArrayRef(P, CompressedSize), UncompressedNameStrings, makeArrayRef(P, CompressedSize), UncompressedNameStrings,

UncompressedSize)) { UncompressedSize)) {

consumeError(std::move(E)); consumeError(std::move(E));

return make_error<InstrProfError>(instrprof_error::uncompress_failed); return make_error<InstrProfError>(instrprof_error::uncompress_failed);

} }

P += CompressedSize; P += CompressedSize;

NameStrings = toStringRef(UncompressedNameStrings); NameStrings = toStringRef(UncompressedNameStrings);

} else { } else {

▲ Show 20 Lines • Show All 862 Lines • Show Last 20 Lines

llvm/lib/ProfileData/InstrProfCorrelator.cpp

	//===-- InstrProfCorrelator.cpp -------------------------------------------===//			//===-- InstrProfCorrelator.cpp -------------------------------------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "llvm/ProfileData/InstrProfCorrelator.h"			#include "llvm/ProfileData/InstrProfCorrelator.h"
				#include "llvm/ADT/Optional.h"
	#include "llvm/DebugInfo/DIContext.h"			#include "llvm/DebugInfo/DIContext.h"
	#include "llvm/DebugInfo/DWARF/DWARFContext.h"			#include "llvm/DebugInfo/DWARF/DWARFContext.h"
	#include "llvm/DebugInfo/DWARF/DWARFDie.h"			#include "llvm/DebugInfo/DWARF/DWARFDie.h"
	#include "llvm/DebugInfo/DWARF/DWARFExpression.h"			#include "llvm/DebugInfo/DWARF/DWARFExpression.h"
	#include "llvm/DebugInfo/DWARF/DWARFFormValue.h"			#include "llvm/DebugInfo/DWARF/DWARFFormValue.h"
	#include "llvm/DebugInfo/DWARF/DWARFLocationExpression.h"			#include "llvm/DebugInfo/DWARF/DWARFLocationExpression.h"
	#include "llvm/DebugInfo/DWARF/DWARFUnit.h"			#include "llvm/DebugInfo/DWARF/DWARFUnit.h"
	#include "llvm/Object/MachO.h"			#include "llvm/Object/MachO.h"
				#include "llvm/Support/Compression.h"
	#include "llvm/Support/Debug.h"			#include "llvm/Support/Debug.h"

	#define DEBUG_TYPE "correlator"			#define DEBUG_TYPE "correlator"

	using namespace llvm;			using namespace llvm;

	/// Get the __llvm_prf_cnts section.			/// Get the __llvm_prf_cnts section.
	Expected<object::SectionRef> getCountersSection(const object::ObjectFile &Obj) {			Expected<object::SectionRef> getCountersSection(const object::ObjectFile &Obj) {
	▲ Show 20 Lines • Show All 118 Lines • ▼ Show 20 Lines
	Error InstrProfCorrelatorImpl<IntPtrT>::correlateProfileData() {			Error InstrProfCorrelatorImpl<IntPtrT>::correlateProfileData() {
	assert(Data.empty() && Names.empty() && NamesVec.empty());			assert(Data.empty() && Names.empty() && NamesVec.empty());
	correlateProfileDataImpl();			correlateProfileDataImpl();
	if (Data.empty() \|\| NamesVec.empty())			if (Data.empty() \|\| NamesVec.empty())
	return make_error<InstrProfError>(			return make_error<InstrProfError>(
	instrprof_error::unable_to_correlate_profile,			instrprof_error::unable_to_correlate_profile,
	"could not find any profile metadata in debug info");			"could not find any profile metadata in debug info");
	auto Result =			auto Result =
	collectPGOFuncNameStrings(NamesVec, /doCompression=/false, Names);			collectPGOFuncNameStrings(NamesVec,
				/CompressionScheme=/llvm::NoneType(), Names);
	CounterOffsets.clear();			CounterOffsets.clear();
	NamesVec.clear();			NamesVec.clear();
	return Result;			return Result;
	}			}

	template <class IntPtrT>			template <class IntPtrT>
	void InstrProfCorrelatorImpl<IntPtrT>::addProbe(StringRef FunctionName,			void InstrProfCorrelatorImpl<IntPtrT>::addProbe(StringRef FunctionName,
	uint64_t CFGHash,			uint64_t CFGHash,
	▲ Show 20 Lines • Show All 135 Lines • Show Last 20 Lines

llvm/lib/ProfileData/SampleProfReader.cpp

Show All 38 Lines
#include <cstddef>		#include <cstddef>
#include <cstdint>		#include <cstdint>
#include <limits>		#include <limits>
#include <memory>		#include <memory>
#include <system_error>		#include <system_error>
#include <vector>		#include <vector>

using namespace llvm;		using namespace llvm;
		using namespace llvm::compression;
using namespace sampleprof;		using namespace sampleprof;

#define DEBUG_TYPE "samplepgo-reader"		#define DEBUG_TYPE "samplepgo-reader"

// This internal option specifies if the profile uses FS discriminators.		// This internal option specifies if the profile uses FS discriminators.
// It only applies to text, binary and compact binary format profiles.		// It only applies to text, binary and compact binary format profiles.
// For ext-binary format profiles, the flag is set in the summary.		// For ext-binary format profiles, the flag is set in the summary.
static cl::opt<bool> ProfileIsFSDisciminator(		static cl::opt<bool> ProfileIsFSDisciminator(
▲ Show 20 Lines • Show All 816 Lines • ▼ Show 20 Lines	std::error_code SampleProfileReaderExtBinaryBase::decompressSection(
auto DecompressSize = readNumber<uint64_t>();		auto DecompressSize = readNumber<uint64_t>();
if (std::error_code EC = DecompressSize.getError())		if (std::error_code EC = DecompressSize.getError())
return EC;		return EC;
DecompressBufSize = *DecompressSize;		DecompressBufSize = *DecompressSize;

auto CompressSize = readNumber<uint64_t>();		auto CompressSize = readNumber<uint64_t>();
if (std::error_code EC = CompressSize.getError())		if (std::error_code EC = CompressSize.getError())
return EC;		return EC;

if (!llvm::compression::zlib::isAvailable())		if (CompressionKind CompressionScheme = CompressionKind::Zlib) {
return sampleprof_error::zlib_unavailable;

uint8_t *Buffer = Allocator.Allocate<uint8_t>(DecompressBufSize);		uint8_t *Buffer = Allocator.Allocate<uint8_t>(DecompressBufSize);
		dblaikieUnsubmitted Not Done Reply Inline Actions Nice to pull out a common variable rather than accessing `CompressionKind::Zlib` twice independently (otherwise we probably might as well go with the more direct API @MaskRay has proposed) dblaikie: Nice to pull out a common variable rather than accessing `CompressionKind::Zlib` twice…
size_t UCSize = DecompressBufSize;		size_t UCSize = DecompressBufSize;
llvm::Error E = compression::zlib::uncompress(		llvm::Error E = CompressionScheme->decompress(
makeArrayRef(Data, *CompressSize), Buffer, UCSize);		makeArrayRef(Data, *CompressSize), Buffer, UCSize);
if (E)		if (E)
return sampleprof_error::uncompress_failed;		return sampleprof_error::uncompress_failed;
DecompressBuf = reinterpret_cast<const uint8_t *>(Buffer);		DecompressBuf = reinterpret_cast<const uint8_t *>(Buffer);
return sampleprof_error::success;		return sampleprof_error::success;
}		}
		return sampleprof_error::zlib_unavailable;
		}

std::error_code SampleProfileReaderExtBinaryBase::readImpl() {		std::error_code SampleProfileReaderExtBinaryBase::readImpl() {
const uint8_t *BufStart =		const uint8_t *BufStart =
reinterpret_cast<const uint8_t *>(Buffer->getBufferStart());		reinterpret_cast<const uint8_t *>(Buffer->getBufferStart());

for (auto &Entry : SecHdrTable) {		for (auto &Entry : SecHdrTable) {
// Skip empty section.		// Skip empty section.
if (!Entry.Size)		if (!Entry.Size)
▲ Show 20 Lines • Show All 998 Lines • Show Last 20 Lines

llvm/lib/ProfileData/SampleProfWriter.cpp

Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	SampleProfileWriterExtBinaryBase::markSectionStart(SecType Type,
assert(Entry.Type == Type && "Unexpected section type");		assert(Entry.Type == Type && "Unexpected section type");
// Use LocalBuf as a temporary output for writting data.		// Use LocalBuf as a temporary output for writting data.
if (hasSecFlag(Entry, SecCommonFlags::SecFlagCompress))		if (hasSecFlag(Entry, SecCommonFlags::SecFlagCompress))
LocalBufStream.swap(OutputStream);		LocalBufStream.swap(OutputStream);
return SectionStart;		return SectionStart;
}		}

std::error_code SampleProfileWriterExtBinaryBase::compressAndOutput() {		std::error_code SampleProfileWriterExtBinaryBase::compressAndOutput() {
if (!llvm::compression::zlib::isAvailable())		compression::CompressionKind CompressionScheme =
		compression::CompressionKind::Zlib;
		if (!CompressionScheme)
return sampleprof_error::zlib_unavailable;		return sampleprof_error::zlib_unavailable;
std::string &UncompressedStrings =		std::string &UncompressedStrings =
static_cast<raw_string_ostream *>(LocalBufStream.get())->str();		static_cast<raw_string_ostream *>(LocalBufStream.get())->str();
if (UncompressedStrings.size() == 0)		if (UncompressedStrings.size() == 0)
return sampleprof_error::success;		return sampleprof_error::success;
auto &OS = *OutputStream;		auto &OS = *OutputStream;
SmallVector<uint8_t, 128> CompressedStrings;		SmallVector<uint8_t, 128> CompressedStrings;
compression::zlib::compress(arrayRefFromStringRef(UncompressedStrings),		CompressionScheme->compress(arrayRefFromStringRef(UncompressedStrings),
CompressedStrings,		CompressedStrings,
compression::zlib::BestSizeCompression);		CompressionScheme->BestSizeLevel);
encodeULEB128(UncompressedStrings.size(), OS);		encodeULEB128(UncompressedStrings.size(), OS);
encodeULEB128(CompressedStrings.size(), OS);		encodeULEB128(CompressedStrings.size(), OS);
OS << toStringRef(CompressedStrings);		OS << toStringRef(CompressedStrings);
UncompressedStrings.clear();		UncompressedStrings.clear();
return sampleprof_error::success;		return sampleprof_error::success;
}		}

/// Add a new section into section header table given the section type		/// Add a new section into section header table given the section type
▲ Show 20 Lines • Show All 800 Lines • Show Last 20 Lines

llvm/lib/Support/Compression.cpp

	Show All 21 Lines
	#endif			#endif
	#if LLVM_ENABLE_ZSTD			#if LLVM_ENABLE_ZSTD
	#include <zstd.h>			#include <zstd.h>
	#endif			#endif

	using namespace llvm;			using namespace llvm;
	using namespace llvm::compression;			using namespace llvm::compression;

				namespace {

	#if LLVM_ENABLE_ZLIB			#if LLVM_ENABLE_ZLIB
				leonardchanUnsubmitted Not Done Reply Inline Actions Perhaps for each of these, you could instead have something like: ZStdCompressionAlgorithm getZStdCompressionAlgorithm() { static ZStdCompressionAlgorithm instance = new ZStdCompressionAlgorithm; return instance; } This way the instances are only new'd when they're actually used. leonardchan: Perhaps for each of these, you could instead have something like: ``` ZStdCompressionAlgorithm…
				dblaikieUnsubmitted Not Done Reply Inline Actions Yep, I'd mentioned/suggested that (so, seconding here) elsewhere encouraging these to be singletons: https://reviews.llvm.org/D130516#3683384 And they don't even need to be 'new'd in that case, this would be fine: ZstdCompressionAlgorithm &getZstdCompressionAlgorithm() { static ZstdCompressionAlgorithm C; return C; } Though I think maybe we don't need individual access to the algorithms, and it'd be fine to have only a single entry point like this: CompressionAlgorithm getCompressionAlgorithm(DebugCompressionType T) { switch (T) { case DebugCompressionType::ZStd: { static zstd::CompressionAlgorithm Zstd; if (zstd::isAvailable()) return &Zstd; } ... } return nullptr; } (or, possibly, we want to return non-null even if it isn't available, if we include other things (like the configure macro name - so callers can use that name to print helpful error messages - but then they have to explicitly check if the algorithm is available after the call)) dblaikie:* Yep, I'd mentioned/suggested that (so, seconding here) elsewhere encouraging these to be…
				ckissaneAuthorUnsubmitted Done Reply Inline Actions they currently already have singleton behavior i.e. `llvm::compression::ZStdCompressionAlgorithm::Instance` is the only place `new ZStdCompressionAlgorithm()` can be put into because the constructor is protected. I'd rather not achieve "This way the instances are only new'd when they're actually used." Because the rewards of that are relatively small, but it will make the code more verbose, I think the current pattern allows the best of both worlds of the namespace approach: (`llvm::compression::zlib::compress` becomes `llvm::compression::ZlibCompression->compress`) but they can be passed as class instances. ckissane: they currently already have singleton behavior i.e. `llvm::compression…
				dblaikieUnsubmitted Not Done Reply Inline Actions Global constructors are to be avoided in LLVM: https://llvm.org/docs/CodingStandards.html#do-not-use-static-constructors (also these objects don't need to be dynamically allocated with `new` - they can be directly allocated (as static locals though, not as globals)) dblaikie: Global constructors are to be avoided in LLVM: https://llvm.org/docs/CodingStandards.html#do…

	static StringRef convertZlibCodeToString(int Code) {			static StringRef convertZlibCodeToString(int Code) {
	switch (Code) {			switch (Code) {
	case Z_MEM_ERROR:			case Z_MEM_ERROR:
	return "zlib error: Z_MEM_ERROR";			return "zlib error: Z_MEM_ERROR";
	case Z_BUF_ERROR:			case Z_BUF_ERROR:
	return "zlib error: Z_BUF_ERROR";			return "zlib error: Z_BUF_ERROR";
	case Z_STREAM_ERROR:			case Z_STREAM_ERROR:
	return "zlib error: Z_STREAM_ERROR";			return "zlib error: Z_STREAM_ERROR";
	case Z_DATA_ERROR:			case Z_DATA_ERROR:
	return "zlib error: Z_DATA_ERROR";			return "zlib error: Z_DATA_ERROR";
	case Z_OK:			case Z_OK:
	default:			default:
	llvm_unreachable("unknown or unexpected zlib status code");			llvm_unreachable("unknown or unexpected zlib status code");
	}			}
	}			}
				#endif
				struct ZlibCompressionAlgorithm : public CompressionAlgorithm {
				#if LLVM_ENABLE_ZLIB

	bool zlib::isAvailable() { return true; }			void compress(ArrayRef<uint8_t> Input,

	void zlib::compress(ArrayRef<uint8_t> Input,
	SmallVectorImpl<uint8_t> &CompressedBuffer, int Level) {			SmallVectorImpl<uint8_t> &CompressedBuffer, int Level) {
	unsigned long CompressedSize = ::compressBound(Input.size());			unsigned long CompressedSize = ::compressBound(Input.size());
	CompressedBuffer.resize_for_overwrite(CompressedSize);			CompressedBuffer.resize_for_overwrite(CompressedSize);
	int Res = ::compress2((Bytef *)CompressedBuffer.data(), &CompressedSize,			int Res = ::compress2((Bytef *)CompressedBuffer.data(), &CompressedSize,
	(const Bytef *)Input.data(), Input.size(), Level);			(const Bytef *)Input.data(), Input.size(), Level);
	if (Res == Z_MEM_ERROR)			if (Res == Z_MEM_ERROR)
	report_bad_alloc_error("Allocation failed");			report_bad_alloc_error("Allocation failed");
	assert(Res == Z_OK);			assert(Res == Z_OK);
	// Tell MemorySanitizer that zlib output buffer is fully initialized.			// Tell MemorySanitizer that zlib output buffer is fully initialized.
	// This avoids a false report when running LLVM with uninstrumented ZLib.			// This avoids a false report when running LLVM with uninstrumented ZLib.
	__msan_unpoison(CompressedBuffer.data(), CompressedSize);			__msan_unpoison(CompressedBuffer.data(), CompressedSize);
	if (CompressedSize < CompressedBuffer.size())			if (CompressedSize < CompressedBuffer.size())
	CompressedBuffer.truncate(CompressedSize);			CompressedBuffer.truncate(CompressedSize);
	}			};
				Error decompress(ArrayRef<uint8_t> Input, uint8_t *UncompressedBuffer,
				leonardchanUnsubmitted Not Done Reply Inline Actions Does `NoneCompressionAlgorithm` imply there's no compression at all? If so, I would think these methods should be empty. leonardchan: Does `NoneCompressionAlgorithm` imply there's no compression at all? If so, I would think these…
	Error zlib::uncompress(ArrayRef<uint8_t> Input, uint8_t *UncompressedBuffer,
	size_t &UncompressedSize) {			size_t &UncompressedSize) {
	int Res =			int Res =
	::uncompress((Bytef )UncompressedBuffer, (uLongf )&UncompressedSize,			::uncompress((Bytef )UncompressedBuffer, (uLongf )&UncompressedSize,
	(const Bytef *)Input.data(), Input.size());			(const Bytef *)Input.data(), Input.size());
	// Tell MemorySanitizer that zlib output buffer is fully initialized.			// Tell MemorySanitizer that zlib output buffer is fully initialized.
	// This avoids a false report when running LLVM with uninstrumented ZLib.			// This avoids a false report when running LLVM with uninstrumented ZLib.
	__msan_unpoison(UncompressedBuffer, UncompressedSize);			__msan_unpoison(UncompressedBuffer, UncompressedSize);
	return Res ? make_error<StringError>(convertZlibCodeToString(Res),			return Res ? make_error<StringError>(convertZlibCodeToString(Res),
	inconvertibleErrorCode())			inconvertibleErrorCode())
	: Error::success();			: Error::success();
	}			};
				leonardchanUnsubmitted Not Done Reply Inline Actions nit: add `override`s to be more explicit these are virtual methods leonardchan: nit: add `override`s to be more explicit these are virtual methods

	Error zlib::uncompress(ArrayRef<uint8_t> Input,
	SmallVectorImpl<uint8_t> &UncompressedBuffer,
	size_t UncompressedSize) {
	UncompressedBuffer.resize_for_overwrite(UncompressedSize);
	Error E =
	zlib::uncompress(Input, UncompressedBuffer.data(), UncompressedSize);
	if (UncompressedSize < UncompressedBuffer.size())
	UncompressedBuffer.truncate(UncompressedSize);
	return E;
	}

	#else			#else
	bool zlib::isAvailable() { return false; }
	void zlib::compress(ArrayRef<uint8_t> Input,			void compress(ArrayRef<uint8_t> Input,
	SmallVectorImpl<uint8_t> &CompressedBuffer, int Level) {			SmallVectorImpl<uint8_t> &CompressedBuffer, int Level) {
	llvm_unreachable("zlib::compress is unavailable");			llvm_unreachable("method:\"compress\" is unsupported for compression "
	}			"algorithm:\"zlib\", "
	Error zlib::uncompress(ArrayRef<uint8_t> Input, uint8_t *UncompressedBuffer,			"reason:\"llvm not compiled with zlib support\"");
				};
				Error decompress(ArrayRef<uint8_t> Input, uint8_t *UncompressedBuffer,
	size_t &UncompressedSize) {			size_t &UncompressedSize) {
	llvm_unreachable("zlib::uncompress is unavailable");			llvm_unreachable(
	}			"method:\"decompress\" is unsupported for compression "
	Error zlib::uncompress(ArrayRef<uint8_t> Input,			"algorithm:\"zlib\", reason:\"llvm not compiled with zlib support\"");
	SmallVectorImpl<uint8_t> &UncompressedBuffer,			};
	size_t UncompressedSize) {
				leonardchanUnsubmitted Not Done Reply Inline Actions If the `llvm_unreachable`s should be the default implementation for all subclasses, perhaps the `[de]compress` methods should be regular virtual with these default implementations rather than abstract virtual. leonardchan: If the `llvm_unreachable`s should be the default implementation for all subclasses, perhaps the…
	llvm_unreachable("zlib::uncompress is unavailable");
	}
	#endif			#endif

	#if LLVM_ENABLE_ZSTD			protected:
				friend CompressionAlgorithm *CompressionKind::operator->() const;
				ZlibCompressionAlgorithm() : CompressionAlgorithm("zlib", 1, 6, 9) {}
				};

				dblaikieUnsubmitted Not Done Reply Inline Actions Maybe these don't need to be static members - if there are singleton insntances of the algorithms, they could be members of those singletons instead (possibly in the base/impl class - the derived classes could pass these values into the base ctor to initialize members in the impl or base - they could even be const public members, avoid the need for accessors (at least avoiding the need for virtual accessors, but hopefully avoiding accessors entirely)) dblaikie: Maybe these don't need to be static members - if there are singleton insntances of the…
				dblaikieUnsubmitted Not Done Reply Inline Actions I don't think there's particular value in these being constexpr members - and maybe we don't need these at all just yet/could leave them out for now? It'd be great to reduce this whole patch to something more comparable with https://reviews.llvm.org/D130458 If you have plans for these other properties it might be helpful to understand what they are - they might help inform the design discussion. (if we are keeping tnhese properties, including the string version of the name, etc - I'd think the way to do it would be for the base algorithm class to have non-static members to store these, and derived algorithm classes to pass the values into the base ctor to be stored in the members - they could even be const public members of the algorithm to be accessed directly, rather than via accessor functions (& certainly not virtual accessor functions)) dblaikie: I don't think there's particular value in these being constexpr members - and maybe we don't…
	bool zstd::isAvailable() { return true; }			struct ZStdCompressionAlgorithm : public CompressionAlgorithm {
				#if LLVM_ENABLE_ZSTD

	void zstd::compress(ArrayRef<uint8_t> Input,			void compress(ArrayRef<uint8_t> Input,
	SmallVectorImpl<uint8_t> &CompressedBuffer, int Level) {			SmallVectorImpl<uint8_t> &CompressedBuffer, int Level) {
	unsigned long CompressedBufferSize = ::ZSTD_compressBound(Input.size());			unsigned long CompressedBufferSize = ::ZSTD_compressBound(Input.size());
	CompressedBuffer.resize_for_overwrite(CompressedBufferSize);			CompressedBuffer.resize_for_overwrite(CompressedBufferSize);
	unsigned long CompressedSize =			unsigned long CompressedSize =
	::ZSTD_compress((char *)CompressedBuffer.data(), CompressedBufferSize,			::ZSTD_compress((char *)CompressedBuffer.data(), CompressedBufferSize,
	(const char *)Input.data(), Input.size(), Level);			(const char *)Input.data(), Input.size(), Level);
	if (ZSTD_isError(CompressedSize))			if (ZSTD_isError(CompressedSize))
	report_bad_alloc_error("Allocation failed");			report_bad_alloc_error("Allocation failed");
	// Tell MemorySanitizer that zstd output buffer is fully initialized.			// Tell MemorySanitizer that zstd output buffer is fully initialized.
	// This avoids a false report when running LLVM with uninstrumented ZLib.			// This avoids a false report when running LLVM with uninstrumented ZLib.
	__msan_unpoison(CompressedBuffer.data(), CompressedSize);			__msan_unpoison(CompressedBuffer.data(), CompressedSize);
	if (CompressedSize < CompressedBuffer.size())			if (CompressedSize < CompressedBuffer.size())
	CompressedBuffer.truncate(CompressedSize);			CompressedBuffer.truncate(CompressedSize);
	}			};
				Error decompress(ArrayRef<uint8_t> Input, uint8_t *UncompressedBuffer,
	Error zstd::uncompress(ArrayRef<uint8_t> Input, uint8_t *UncompressedBuffer,
	size_t &UncompressedSize) {			size_t &UncompressedSize) {
	const size_t Res =			const size_t Res =
	::ZSTD_decompress(UncompressedBuffer, UncompressedSize,			::ZSTD_decompress(UncompressedBuffer, UncompressedSize,
	(const uint8_t *)Input.data(), Input.size());			(const uint8_t *)Input.data(), Input.size());
	UncompressedSize = Res;			UncompressedSize = Res;
	// Tell MemorySanitizer that zstd output buffer is fully initialized.			// Tell MemorySanitizer that zstd output buffer is fully initialized.
	// This avoids a false report when running LLVM with uninstrumented ZLib.			// This avoids a false report when running LLVM with uninstrumented ZLib.
	__msan_unpoison(UncompressedBuffer, UncompressedSize);			__msan_unpoison(UncompressedBuffer, UncompressedSize);
	return ZSTD_isError(Res) ? make_error<StringError>(ZSTD_getErrorName(Res),			return ZSTD_isError(Res) ? make_error<StringError>(ZSTD_getErrorName(Res),
	inconvertibleErrorCode())			inconvertibleErrorCode())
	: Error::success();			: Error::success();
	}			};

	Error zstd::uncompress(ArrayRef<uint8_t> Input,
	SmallVectorImpl<uint8_t> &UncompressedBuffer,
	size_t UncompressedSize) {
	UncompressedBuffer.resize_for_overwrite(UncompressedSize);
	Error E =
	zstd::uncompress(Input, UncompressedBuffer.data(), UncompressedSize);
	if (UncompressedSize < UncompressedBuffer.size())
	UncompressedBuffer.truncate(UncompressedSize);
	return E;
	}

	#else			#else
	bool zstd::isAvailable() { return false; }
	void zstd::compress(ArrayRef<uint8_t> Input,			void compress(ArrayRef<uint8_t> Input,
	SmallVectorImpl<uint8_t> &CompressedBuffer, int Level) {			SmallVectorImpl<uint8_t> &CompressedBuffer, int Level) {
	llvm_unreachable("zstd::compress is unavailable");			llvm_unreachable("method:\"compress\" is unsupported for compression "
	}			"algorithm:\"zstd\", "
	Error zstd::uncompress(ArrayRef<uint8_t> Input, uint8_t *UncompressedBuffer,			"reason:\"llvm not compiled with zstd support\"");
				};
				Error decompress(ArrayRef<uint8_t> Input, uint8_t *UncompressedBuffer,
				size_t &UncompressedSize) {
				llvm_unreachable(
				"method:\"decompress\" is unsupported for compression "
				"algorithm:\"zstd\", reason:\"llvm not compiled with zstd support\"");
				};

				#endif

				protected:
				leonardchanUnsubmitted Not Done Reply Inline Actions I think this cast might not be needed leonardchan: I think this cast might not be needed
				friend CompressionAlgorithm *CompressionKind::operator->() const;
				ZStdCompressionAlgorithm() : CompressionAlgorithm("zstd", 1, 5, 12) {}
				};
				leonardchanUnsubmitted Not Done Reply Inline Actions Same here leonardchan: Same here

				struct UnknownCompressionAlgorithm : public CompressionAlgorithm {

				void compress(ArrayRef<uint8_t> Input,
				SmallVectorImpl<uint8_t> &CompressedBuffer, int Level) {
				llvm_unreachable("method:\"compress\" is unsupported for compression "
				"algorithm:\"unknown\", reason:\"can't call on unknown\"");
				};
				Error decompress(ArrayRef<uint8_t> Input, uint8_t *UncompressedBuffer,
	size_t &UncompressedSize) {			size_t &UncompressedSize) {
	llvm_unreachable("zstd::uncompress is unavailable");			llvm_unreachable("method:\"decompress\" is unsupported for compression "
				"algorithm:\"unknown\", reason:\"can't call on unknown\"");
	}			}
	Error zstd::uncompress(ArrayRef<uint8_t> Input,
	SmallVectorImpl<uint8_t> &UncompressedBuffer,			protected:
	size_t UncompressedSize) {			friend CompressionAlgorithm *CompressionKind::operator->() const;
				leonardchanUnsubmitted Not Done Reply Inline Actions Is the purpose of `UnknownCompressionAlgorithm` to be the default instance here? If so, would it be better perhaps to just omit this and have an `llvm_unreachable` in the `default` case below? I would assume users of this function should just have the right compression scheme ID they need and any error checking on if something is a valid ID would be done before calling this. leonardchan: Is the purpose of `UnknownCompressionAlgorithm` to be the default instance here? If so, would…
	llvm_unreachable("zstd::uncompress is unavailable");			UnknownCompressionAlgorithm()
				: CompressionAlgorithm("unknown", -999, -999, -999) {}
				};

				} // namespace

				CompressionAlgorithm *CompressionKind::operator->() const {
				switch (uint8_t(CompressionID)) {
				case uint8_t(CompressionKind::Zlib):
				static ZlibCompressionAlgorithm ZlibI;
				return &ZlibI;
				leonardchanUnsubmitted Not Done Reply Inline Actions https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements or perhaps just return left ? left : NoneType(); leonardchan: https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies…
				case uint8_t(CompressionKind::ZStd):
				static ZStdCompressionAlgorithm ZStdI;
				return &ZStdI;
				default:
				static UnknownCompressionAlgorithm UnknownI;
				return &UnknownI;
	}			}
	#endif			}
				No newline at end of file

llvm/tools/llvm-mc/llvm-mc.cpp

Show First 20 Lines • Show All 395 Lines • ▼ Show 20 Lines	int main(int argc, char **argv) {

std::unique_ptr<MCAsmInfo> MAI(		std::unique_ptr<MCAsmInfo> MAI(
TheTarget->createMCAsmInfo(*MRI, TripleName, MCOptions));		TheTarget->createMCAsmInfo(*MRI, TripleName, MCOptions));
assert(MAI && "Unable to create target asm info!");		assert(MAI && "Unable to create target asm info!");

MAI->setRelaxELFRelocations(RelaxELFRel);		MAI->setRelaxELFRelocations(RelaxELFRel);

if (CompressDebugSections != DebugCompressionType::None) {		if (CompressDebugSections != DebugCompressionType::None) {
if (!compression::zlib::isAvailable()) {		if (!compression::CompressionKind::Zlib) {
WithColor::error(errs(), ProgName)		WithColor::error(errs(), ProgName)
<< "build tools with zlib to enable -compress-debug-sections";		<< "build tools with zlib to enable -compress-debug-sections";
return 1;		return 1;
}		}
MAI->setCompressDebugSections(CompressDebugSections);		MAI->setCompressDebugSections(CompressDebugSections);
}		}
MAI->setPreserveAsmComments(PreserveComments);		MAI->setPreserveAsmComments(PreserveComments);

▲ Show 20 Lines • Show All 196 Lines • Show Last 20 Lines

llvm/tools/llvm-objcopy/ObjcopyOptions.cpp

Show First 20 Lines • Show All 717 Lines • ▼ Show 20 Lines	if (OutputFormat.empty()) {
Config.OutputArch = Target->Machine;		Config.OutputArch = Target->Machine;
}		}
}		}

if (const auto *A = InputArgs.getLastArg(OBJCOPY_compress_debug_sections)) {		if (const auto *A = InputArgs.getLastArg(OBJCOPY_compress_debug_sections)) {
Config.CompressionType = StringSwitch<DebugCompressionType>(A->getValue())		Config.CompressionType = StringSwitch<DebugCompressionType>(A->getValue())
.Case("zlib", DebugCompressionType::Z)		.Case("zlib", DebugCompressionType::Z)
.Default(DebugCompressionType::None);		.Default(DebugCompressionType::None);
if (Config.CompressionType == DebugCompressionType::None)		switch (Config.CompressionType) {
		case DebugCompressionType::None:
return createStringError(		return createStringError(
errc::invalid_argument,		errc::invalid_argument,
"invalid or unsupported --compress-debug-sections format: %s",		"invalid or unsupported --compress-debug-sections format: %s",
A->getValue());		A->getValue());
if (!compression::zlib::isAvailable())		case DebugCompressionType::Z:
		if (!compression::CompressionKind::Zlib)
return createStringError(		return createStringError(
errc::invalid_argument,		errc::invalid_argument,
"LLVM was not compiled with LLVM_ENABLE_ZLIB: can not compress");		"LLVM was not compiled with LLVM_ENABLE_ZLIB: can not compress");
		break;
		}
}		}

Config.AddGnuDebugLink = InputArgs.getLastArgValue(OBJCOPY_add_gnu_debuglink);		Config.AddGnuDebugLink = InputArgs.getLastArgValue(OBJCOPY_add_gnu_debuglink);
// The gnu_debuglink's target is expected to not change or else its CRC would		// The gnu_debuglink's target is expected to not change or else its CRC would
// become invalidated and get rejected. We can avoid recalculating the		// become invalidated and get rejected. We can avoid recalculating the
// checksum for every target file inside an archive by precomputing the CRC		// checksum for every target file inside an archive by precomputing the CRC
// here. This prevents a significant amount of I/O.		// here. This prevents a significant amount of I/O.
if (!Config.AddGnuDebugLink.empty()) {		if (!Config.AddGnuDebugLink.empty()) {
▲ Show 20 Lines • Show All 245 Lines • ▼ Show 20 Lines	objcopy::parseObjcopyOptions(ArrayRef<const char *> RawArgsArr,
if (Config.DecompressDebugSections &&		if (Config.DecompressDebugSections &&
Config.CompressionType != DebugCompressionType::None) {		Config.CompressionType != DebugCompressionType::None) {
return createStringError(		return createStringError(
errc::invalid_argument,		errc::invalid_argument,
"cannot specify both --compress-debug-sections and "		"cannot specify both --compress-debug-sections and "
"--decompress-debug-sections");		"--decompress-debug-sections");
}		}

if (Config.DecompressDebugSections && !compression::zlib::isAvailable())		if (Config.DecompressDebugSections && !compression::CompressionKind::Zlib)
return createStringError(		return createStringError(
errc::invalid_argument,		errc::invalid_argument,
"LLVM was not compiled with LLVM_ENABLE_ZLIB: cannot decompress");		"LLVM was not compiled with LLVM_ENABLE_ZLIB: cannot decompress");

if (Config.ExtractPartition && Config.ExtractMainPartition)		if (Config.ExtractPartition && Config.ExtractMainPartition)
return createStringError(errc::invalid_argument,		return createStringError(errc::invalid_argument,
"cannot specify --extract-partition together with "		"cannot specify --extract-partition together with "
"--extract-main-partition");		"--extract-main-partition");
▲ Show 20 Lines • Show All 355 Lines • Show Last 20 Lines

llvm/unittests/ProfileData/InstrProfTest.cpp

//===- unittest/ProfileData/InstrProfTest.cpp -------------------- C++ --===//		//===- unittest/ProfileData/InstrProfTest.cpp -------------------- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		#include "llvm/ADT/Optional.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/ProfileData/InstrProfReader.h"		#include "llvm/ProfileData/InstrProfReader.h"
#include "llvm/ProfileData/InstrProfWriter.h"		#include "llvm/ProfileData/InstrProfWriter.h"
#include "llvm/ProfileData/MemProf.h"		#include "llvm/ProfileData/MemProf.h"
#include "llvm/ProfileData/MemProfData.inc"		#include "llvm/ProfileData/MemProfData.inc"
#include "llvm/Support/Compression.h"		#include "llvm/Support/Compression.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Testing/Support/Error.h"		#include "llvm/Testing/Support/Error.h"
#include "llvm/Testing/Support/SupportHelpers.h"		#include "llvm/Testing/Support/SupportHelpers.h"
#include "gtest/gtest.h"		#include "gtest/gtest.h"
#include <cstdarg>		#include <cstdarg>

using namespace llvm;		using namespace llvm;
		using namespace llvm::compression;

LLVM_NODISCARD static ::testing::AssertionResult		LLVM_NODISCARD static ::testing::AssertionResult
ErrorEquals(instrprof_error Expected, Error E) {		ErrorEquals(instrprof_error Expected, Error E) {
instrprof_error Found;		instrprof_error Found;
std::string FoundMsg;		std::string FoundMsg;
handleAllErrors(std::move(E), [&](const InstrProfError &IPE) {		handleAllErrors(std::move(E), [&](const InstrProfError &IPE) {
Found = IPE.get();		Found = IPE.get();
FoundMsg = IPE.message();		FoundMsg = IPE.message();
▲ Show 20 Lines • Show All 1,108 Lines • ▼ Show 20 Lines	for (int I = 0; I < 3; I++) {
str.clear();		str.clear();
OS << "BlahblahBlahblahBar_" << I;		OS << "BlahblahBlahblahBar_" << I;
FuncNames2.push_back(OS.str());		FuncNames2.push_back(OS.str());
}		}

for (bool DoCompression : {false, true}) {		for (bool DoCompression : {false, true}) {
// Compressing:		// Compressing:
std::string FuncNameStrings1;		std::string FuncNameStrings1;
EXPECT_THAT_ERROR(collectPGOFuncNameStrings(		EXPECT_THAT_ERROR(
FuncNames1,		collectPGOFuncNameStrings(FuncNames1,
(DoCompression && compression::zlib::isAvailable()),		DoCompression && CompressionKind::Zlib
		? llvm::Optional<CompressionKind>(
		compression::CompressionKind::Zlib)
		: llvm::NoneType(),
FuncNameStrings1),		FuncNameStrings1),
Succeeded());		Succeeded());

// Compressing:		// Compressing:
std::string FuncNameStrings2;		std::string FuncNameStrings2;
EXPECT_THAT_ERROR(collectPGOFuncNameStrings(		EXPECT_THAT_ERROR(
FuncNames2,		collectPGOFuncNameStrings(FuncNames2,
(DoCompression && compression::zlib::isAvailable()),		DoCompression && CompressionKind::Zlib
		? llvm::Optional<CompressionKind>(
		compression::CompressionKind::Zlib)
		: llvm::NoneType(),
FuncNameStrings2),		FuncNameStrings2),
Succeeded());		Succeeded());

for (int Padding = 0; Padding < 2; Padding++) {		for (int Padding = 0; Padding < 2; Padding++) {
// Join with paddings :		// Join with paddings :
std::string FuncNameStrings = FuncNameStrings1;		std::string FuncNameStrings = FuncNameStrings1;
for (int P = 0; P < Padding; P++) {		for (int P = 0; P < Padding; P++) {
FuncNameStrings.push_back('\0');		FuncNameStrings.push_back('\0');
}		}
FuncNameStrings += FuncNameStrings2;		FuncNameStrings += FuncNameStrings2;
▲ Show 20 Lines • Show All 101 Lines • Show Last 20 Lines

llvm/unittests/Support/CompressionTest.cpp

	Show All 16 Lines
	#include "llvm/Support/Error.h"			#include "llvm/Support/Error.h"
	#include "gtest/gtest.h"			#include "gtest/gtest.h"

	using namespace llvm;			using namespace llvm;
	using namespace llvm::compression;			using namespace llvm::compression;

	namespace {			namespace {

	#if LLVM_ENABLE_ZLIB			static void testCompressionAlgorithm(
	static void testZlibCompression(StringRef Input, int Level) {			StringRef Input, int Level, CompressionKind CompressionScheme,
				std::string ExpectedDestinationBufferTooSmallErrorMessage) {
	SmallVector<uint8_t, 0> Compressed;			SmallVector<uint8_t, 0> Compressed;
	SmallVector<uint8_t, 0> Uncompressed;			SmallVector<uint8_t, 0> Uncompressed;
	zlib::compress(arrayRefFromStringRef(Input), Compressed, Level);			CompressionScheme->compress(arrayRefFromStringRef(Input), Compressed, Level);

	// Check that uncompressed buffer is the same as original.			// Check that uncompressed buffer is the same as original.
	Error E = zlib::uncompress(Compressed, Uncompressed, Input.size());			Error E =
				CompressionScheme->decompress(Compressed, Uncompressed, Input.size());
	consumeError(std::move(E));			consumeError(std::move(E));

	EXPECT_EQ(Input, toStringRef(Uncompressed));			EXPECT_EQ(Input, toStringRef(Uncompressed));
	if (Input.size() > 0) {			if (Input.size() > 0) {
	// Uncompression fails if expected length is too short.			// Uncompression fails if expected length is too short.
	E = zlib::uncompress(Compressed, Uncompressed, Input.size() - 1);			E = CompressionScheme->decompress(Compressed, Uncompressed,
	EXPECT_EQ("zlib error: Z_BUF_ERROR", llvm::toString(std::move(E)));			Input.size() - 1);
				EXPECT_EQ(ExpectedDestinationBufferTooSmallErrorMessage,
				llvm::toString(std::move(E)));
	}			}
	}			}

				#if LLVM_ENABLE_ZLIB
				static void testZlibCompression(StringRef Input, int Level) {
				testCompressionAlgorithm(Input, Level, CompressionKind::Zlib,
				"zlib error: Z_BUF_ERROR");
				}

	TEST(CompressionTest, Zlib) {			TEST(CompressionTest, Zlib) {
	testZlibCompression("", zlib::DefaultCompression);			CompressionKind CompressionScheme = CompressionKind::Zlib;
				testZlibCompression("", CompressionScheme->DefaultLevel);

	testZlibCompression("hello, world!", zlib::NoCompression);			testZlibCompression("hello, world!", CompressionScheme->BestSizeLevel);
	testZlibCompression("hello, world!", zlib::BestSizeCompression);			testZlibCompression("hello, world!", CompressionScheme->BestSpeedLevel);
	testZlibCompression("hello, world!", zlib::BestSpeedCompression);			testZlibCompression("hello, world!", CompressionScheme->DefaultLevel);
	testZlibCompression("hello, world!", zlib::DefaultCompression);

	const size_t kSize = 1024;			const size_t kSize = 1024;
	char BinaryData[kSize];			char BinaryData[kSize];
	for (size_t i = 0; i < kSize; ++i)			for (size_t i = 0; i < kSize; ++i)
	BinaryData[i] = i & 255;			BinaryData[i] = i & 255;
	StringRef BinaryDataStr(BinaryData, kSize);			StringRef BinaryDataStr(BinaryData, kSize);

	testZlibCompression(BinaryDataStr, zlib::NoCompression);			testZlibCompression(BinaryDataStr, CompressionScheme->BestSizeLevel);
	testZlibCompression(BinaryDataStr, zlib::BestSizeCompression);			testZlibCompression(BinaryDataStr, CompressionScheme->BestSpeedLevel);
	testZlibCompression(BinaryDataStr, zlib::BestSpeedCompression);			testZlibCompression(BinaryDataStr, CompressionScheme->DefaultLevel);
	testZlibCompression(BinaryDataStr, zlib::DefaultCompression);
	}			}
	#endif			#endif

	#if LLVM_ENABLE_ZSTD			#if LLVM_ENABLE_ZSTD
	static void testZstdCompression(StringRef Input, int Level) {
	SmallVector<uint8_t, 0> Compressed;
	SmallVector<uint8_t, 0> Uncompressed;
	zstd::compress(arrayRefFromStringRef(Input), Compressed, Level);

	// Check that uncompressed buffer is the same as original.			static void testZStdCompression(StringRef Input, int Level) {
	Error E = zstd::uncompress(Compressed, Uncompressed, Input.size());			testCompressionAlgorithm(Input, Level, CompressionKind::ZStd,
	consumeError(std::move(E));			"Destination buffer is too small");

	EXPECT_EQ(Input, toStringRef(Uncompressed));
	if (Input.size() > 0) {
	// Uncompression fails if expected length is too short.
	E = zstd::uncompress(Compressed, Uncompressed, Input.size() - 1);
	EXPECT_EQ("Destination buffer is too small", llvm::toString(std::move(E)));
	}
	}			}

	TEST(CompressionTest, Zstd) {			TEST(CompressionTest, Zstd) {
	testZstdCompression("", zstd::DefaultCompression);			CompressionKind CompressionScheme = CompressionKind::ZStd;
				testZStdCompression("", CompressionScheme->DefaultLevel);

	testZstdCompression("hello, world!", zstd::NoCompression);			testZStdCompression("hello, world!", CompressionScheme->BestSizeLevel);
	testZstdCompression("hello, world!", zstd::BestSizeCompression);			testZStdCompression("hello, world!", CompressionScheme->BestSpeedLevel);
	testZstdCompression("hello, world!", zstd::BestSpeedCompression);			testZStdCompression("hello, world!", CompressionScheme->DefaultLevel);
	testZstdCompression("hello, world!", zstd::DefaultCompression);

	const size_t kSize = 1024;			const size_t kSize = 1024;
	char BinaryData[kSize];			char BinaryData[kSize];
	for (size_t i = 0; i < kSize; ++i)			for (size_t i = 0; i < kSize; ++i)
	BinaryData[i] = i & 255;			BinaryData[i] = i & 255;
	StringRef BinaryDataStr(BinaryData, kSize);			StringRef BinaryDataStr(BinaryData, kSize);

	testZstdCompression(BinaryDataStr, zstd::NoCompression);			testZStdCompression(BinaryDataStr, CompressionScheme->BestSizeLevel);
	testZstdCompression(BinaryDataStr, zstd::BestSizeCompression);			testZStdCompression(BinaryDataStr, CompressionScheme->BestSpeedLevel);
	testZstdCompression(BinaryDataStr, zstd::BestSpeedCompression);			testZStdCompression(BinaryDataStr, CompressionScheme->DefaultLevel);
	testZstdCompression(BinaryDataStr, zstd::DefaultCompression);
	}			}
	#endif			#endif
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[llvm] compression classesNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 451247

clang-tools-extra/clangd/index/Serialization.cpp

clang-tools-extra/clangd/unittests/SerializationTests.cpp

clang/lib/Driver/ToolChains/Clang.cpp

clang/lib/Serialization/ASTReader.cpp

clang/lib/Serialization/ASTWriter.cpp

lld/ELF/Driver.cpp

lld/ELF/InputSection.cpp

llvm/include/llvm/Object/Decompressor.h

llvm/include/llvm/ProfileData/InstrProf.h

llvm/include/llvm/Support/Compression.h

llvm/lib/MC/ELFObjectWriter.cpp

llvm/lib/ObjCopy/ELF/ELFObject.cpp

llvm/lib/Object/Decompressor.cpp

llvm/lib/ProfileData/Coverage/CoverageMappingReader.cpp

llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp

llvm/lib/ProfileData/InstrProf.cpp

llvm/lib/ProfileData/InstrProfCorrelator.cpp

llvm/lib/ProfileData/SampleProfReader.cpp

llvm/lib/ProfileData/SampleProfWriter.cpp

llvm/lib/Support/Compression.cpp

llvm/tools/llvm-mc/llvm-mc.cpp

llvm/tools/llvm-objcopy/ObjcopyOptions.cpp

llvm/unittests/ProfileData/InstrProfTest.cpp

llvm/unittests/Support/CompressionTest.cpp

[llvm] compression classes
Needs ReviewPublic