This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/include/llvm/Object/
-
include/
-
llvm/
-
Object/
-
Decompressor.h

Differential D93761

[libObject/Decompressor] - Use `resize_for_overwrite` in Decompressor::resizeAndDecompress().
AcceptedPublic

Authored by grimar on Dec 23 2020, 7:01 AM.

Download Raw Diff

Details

Reviewers

jhenderson
MaskRay

Summary

I've debugged the code that tries to decompress a large
chunk of data. In my case the size of decompressed data was expected
to be ~4GB.

I've found that the following code

SmallString<0> Out;
...
Out.resize(DecompressedSize);

takes too long. Instead of resize we can use the resize_for_overwrite here.

In Debug configuration Out.resize(0xffffffff) call takes 44,1s,
while Out.resize_for_overwrite(0xffffffff) takes 10.8s for me. I.e. speedup is ~4.08x.

For Release configuration the numbers are:
resize_for_overwrite takes ~1700 microseconds, resize takes ~1850000 microseconds.
I.e. the resize_for_overwrite is ~1088x times faster.

It is very noticable when decompression does not happen. E.g. when the compressed data
is corrupted. In this case the new Decompressor::resizeAndDecompress() finishes much faster.

Diff Detail

Event Timeline

grimar requested review of this revision.Dec 23 2020, 7:01 AM

grimar created this revision.

Herald added a project: Restricted Project. · View Herald TranscriptDec 23 2020, 7:01 AM

Good to know that D93532 added resize_for_overwrite.

For Release configuration the numbers are: resize_for_overwrite takes ~1700 microseconds, resize takes ~1850000 microseconds. I.e. the new version is ~1088x times faster.

IMO this may be misleading. This appears to be faster because previously memset zero was called on the buffer which triggered page faults and allocates the pages immediately.
With the new change the time is amortized with the decompression itself. I think we should measure the total time with decompression.

LG once this description is fixed or the information is removed from the description.

In D93761#2470250, @MaskRay wrote:

Good to know that D93532 added resize_for_overwrite.

For Release configuration the numbers are: resize_for_overwrite takes ~1700 microseconds, resize takes ~1850000 microseconds. I.e. the new version is ~1088x times faster.

IMO this may be misleading. This appears to be faster because previously memset zero was called on the buffer which triggered page faults and allocates the pages immediately.
With the new change the time is amortized with the decompression itself. I think we should measure the total time with decompression.

LG once this description is fixed or the information is removed from the description.

For me the compressed data is corrupted. I.e. decompression does not happen. I'll update the description.

grimar retitled this revision from [libObject/Decompressor] - Speedup Decompressor::resizeAndDecompress() method. to [libObject/Decompressor] - Use `resize_for_overwrite` in Decompressor::resizeAndDecompress()..Dec 23 2020, 11:34 PM

grimar edited the summary of this revision. (Show Details)

grimar edited the summary of this revision. (Show Details)Dec 23 2020, 11:37 PM

grimar mentioned this in D93532: [ADT] Add resize_for_overwrite method to SmallVector..Dec 23 2020, 11:40 PM

In D93761#2470250, @MaskRay wrote:

IMO this may be misleading. This appears to be faster because previously memset zero was called on the buffer which triggered page faults and allocates the pages immediately.
With the new change the time is amortized with the decompression itself. I think we should measure the total time with decompression.

Be interesting to see the difference, would be nice to query performance of decompression alone as well. See how much slowdown there is if pages haven't been allocated eagerly.

Ok. I've used the following piece of code to construct a 1Gb of random compressed data and estimated
the difference between the old and the new code in Release.

auto T1 = std::chrono::high_resolution_clock::now();
constexpr uint64_t Size = 1024 * 1024 * 1024uLL;
std::vector<uint8_t> Data;
Data.reserve(Size);
for (size_t I = 0; I != Size; ++I)
  Data.push_back(rand());

auto T2 = std::chrono::high_resolution_clock::now();
outs()
    << "Create data (microseconds): "
    << std::chrono::duration_cast<std::chrono::microseconds>(T2 - T1).count()
    << "\n";

SmallVector<char, 1> CompressedData;
T1 = std::chrono::high_resolution_clock::now();
if (Error Err = zlib::compress(
        StringRef(reinterpret_cast<const char *>(Data.data()), Size),
        CompressedData)) {
  return 1;
}
T2 = std::chrono::high_resolution_clock::now();
outs()
    << "Compress data (microseconds): "
    << std::chrono::duration_cast<std::chrono::microseconds>(T2 - T1).count()
    << "\n";
outs() << "Compressed data size: " << CompressedData.size() << "\n";

object::Decompressor Decompressor(
    StringRef(CompressedData.data(), CompressedData.size()));
Decompressor.DecompressedSize = Size;

T1 = std::chrono::high_resolution_clock::now();
SmallString<0> DecompressedData;
Decompressor.resizeAndDecompress(DecompressedData);
T2 = std::chrono::high_resolution_clock::now();
outs()
    << "Decompress data (microseconds): "
    << std::chrono::duration_cast<std::chrono::microseconds>(T2 - T1).count()
    << "\n";

Selected an average result from 3 runs:

----------------------------------------
With `resize`:
----------------------------------------

Create data (microseconds): 16963852
Compress data (microseconds): 27999644
Compressed data size: 1074069335
Decompress data (microseconds): 836237

----------------------------------------
With `resize_for_overwrite`:
----------------------------------------

Create data (microseconds): 17025348
Compress data (microseconds): 27997363
Compressed data size: 1074069335
Decompress data (microseconds): 549606

836237 * 0.657 == 549606, i.e. the new code is about 35% faster for me.

Ping.

Looks fine to me in principle, though I don't use any compression/decompression functionality, so I'm not necessarily a fair judge. Maybe give a bit more time for someone else to comment.

This revision is now accepted and ready to land.Jan 20 2021, 12:32 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Object/

Decompressor.h

2 lines

Diff 313549

llvm/include/llvm/Object/Decompressor.h

Show All 24 Lines	public:
/// @param IsLE Flag determines if Data is in little endian form.		/// @param IsLE Flag determines if Data is in little endian form.
/// @param Is64Bit Flag determines if object is 64 bit.		/// @param Is64Bit Flag determines if object is 64 bit.
static Expected<Decompressor> create(StringRef Name, StringRef Data,		static Expected<Decompressor> create(StringRef Name, StringRef Data,
bool IsLE, bool Is64Bit);		bool IsLE, bool Is64Bit);

/// Resize the buffer and uncompress section data into it.		/// Resize the buffer and uncompress section data into it.
/// @param Out Destination buffer.		/// @param Out Destination buffer.
template <class T> Error resizeAndDecompress(T &Out) {		template <class T> Error resizeAndDecompress(T &Out) {
Out.resize(DecompressedSize);		Out.resize_for_overwrite(DecompressedSize);
return decompress({Out.data(), (size_t)DecompressedSize});		return decompress({Out.data(), (size_t)DecompressedSize});
}		}

/// Uncompress section data to raw buffer provided.		/// Uncompress section data to raw buffer provided.
/// @param Buffer Destination buffer.		/// @param Buffer Destination buffer.
Error decompress(MutableArrayRef<char> Buffer);		Error decompress(MutableArrayRef<char> Buffer);

/// Return memory buffer size required for decompression.		/// Return memory buffer size required for decompression.
Show All 25 Lines