This is an archive of the discontinued LLVM Phabricator instance.

[modules] Support zstd in .pcm file
ClosedPublic

Authored by MaskRay on Nov 12 2022, 12:01 AM.

Details

Summary

Extend SM_SLOC_BUFFER_BLOB_COMPRESSED to allow zstd, which is much faster
(compression/decompression) than zlib with a similar compression ratio.

An alternative is to add a value beside SM_SLOC_BUFFER_BLOB_COMPRESSED, but
reusing SM_SLOC_BUFFER_BLOB_COMPRESSED slightly simplifies the implementation
and leads to better diagnostics when a slightly older Clang consumes zstd
compressed blob.

Compressing AST takes a small portion of WriteAST, so we can pick a higher
compression level.

Compiling a relatively large .pcm (absl endian) with -fmodules-embed-all-files,
zstd level 9 has comparable performance with zlib-chromium level 6 (default),
but provides smaller output (5809156 => 5796016). Higher zstd levels will make
"Compress AST" notably slower and do not provide significant more size saving.

2.219345 Total ExecuteCompiler
0.746799 Total Frontend
0.736862 Total Source
0.339434 Total ReadAST
0.165452 Total WriteAST
0.043045 Total Compress AST
0.008236 Total ParseClass
0.00633 Total InstantiateClass
0.001887 Total isPotentialConstantExpr
0.001808 Total InstantiateFunction
0.001535 Total EvaluateForOverflow
0.000986 Total EvaluateAsRValue
0.000536 Total EvaluateAsBooleanCondition
0.000308 Total EvaluateAsConstantExpr
0.000156 Total EvaluateAsInt
3.4e-05 Total EvaluateKnownConstInt
8e-06 Total EvaluateAsInitializer
0 Total PerformPendingInstantiations

Diff Detail

Event Timeline

MaskRay created this revision.Nov 12 2022, 12:01 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 12 2022, 12:01 AM
Herald added a subscriber: StephenFan. · View Herald Transcript
MaskRay requested review of this revision.Nov 12 2022, 12:01 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 12 2022, 12:01 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
MaskRay updated this revision to Diff 474953.Nov 12 2022, 11:49 AM
MaskRay edited the summary of this revision. (Show Details)

use level 9

MaskRay edited the summary of this revision. (Show Details)Nov 12 2022, 12:16 PM
dblaikie added inline comments.
clang/lib/Serialization/ASTReader.cpp
1456

Worth a comment about this magic number?

MaskRay updated this revision to Diff 476360.Nov 17 2022, 11:56 PM
MaskRay marked an inline comment as done.

add a comment about the zlib header

Release note?

clang/lib/Serialization/ASTReader.cpp
1457

Could you put the magic number into a named constant?

dblaikie added inline comments.Nov 18 2022, 10:24 AM
clang/lib/Serialization/ASTReader.cpp
1457

& perhaps we should have/test a magic number for zstd too, so it's clear it's one or the other and not something else added in the future (since this isn't reving the bitcode version or anything - reading zstd compressed data with the library before this version will I guess/hopefully appear corrupted, but we could avoid that being the failure mode in the future when another compression scheme is added by checking explicitly for zstd/zlib now and reporting unknown compression scheme otherwise?)?

MaskRay marked 2 inline comments as done.Nov 18 2022, 11:18 AM
MaskRay added inline comments.
clang/lib/Serialization/ASTReader.cpp
1457

Could you put the magic number into a named constant?

I think a constant doesn't improve readability of this used-once value with a comment.

& perhaps we should have/test a magic number for zstd too, so it's clear it's one or the other and not something else added in the future

Since the version number keeps increasing, when a new algorithm is added, they will see a version mismatch error...

dblaikie added inline comments.Nov 18 2022, 11:23 AM
clang/lib/Serialization/ASTReader.cpp
1457

& perhaps we should have/test a magic number for zstd too, so it's clear it's one or the other and not something else added in the future

Since the version number keeps increasing, when a new algorithm is added, they will see a version mismatch error...

Oh, right, sorry, got muddled - thinking bitcode & then thinking LLVM IR bitcode guarantees, super stable, but the AST's super unstable/closely version locked anyway.

MaskRay marked an inline comment as done.Nov 22 2022, 5:57 PM

Ping:)

dblaikie accepted this revision.Nov 23 2022, 10:31 AM

Sounds good

This revision is now accepted and ready to land.Nov 23 2022, 10:31 AM
This revision was landed with ongoing or failed builds.Nov 23 2022, 11:27 AM
This revision was automatically updated to reflect the committed changes.
dyung added a subscriber: dyung.Nov 23 2022, 12:29 PM

@MaskRay your change seems to be causing 1509 failures on the linux PS4 bot. Most of the errors look similar to this:

error: 'error' diagnostics seen but not expected: 
  (frontend): malformed or corrupted AST file: 'LLVM was not built with LLVM_ENABLE_ZSTD or did not find zstd at build time'
1 error generated.

Can you take a look?

@MaskRay your change seems to be causing 1509 failures on the linux PS4 bot. Most of the errors look similar to this:

error: 'error' diagnostics seen but not expected: 
  (frontend): malformed or corrupted AST file: 'LLVM was not built with LLVM_ENABLE_ZSTD or did not find zstd at build time'
1 error generated.

Can you take a look?

Sorry, forgot to include a link to the failing buildbot: https://lab.llvm.org/buildbot/#/builders/139/builds/31513

@MaskRay your change seems to be causing 1509 failures on the linux PS4 bot. Most of the errors look similar to this:

error: 'error' diagnostics seen but not expected: 
  (frontend): malformed or corrupted AST file: 'LLVM was not built with LLVM_ENABLE_ZSTD or did not find zstd at build time'
1 error generated.

Can you take a look?

Back from lunch. Sorry for the breakage. I picked the wrong zlib header... Fixed now.