This is an archive of the discontinued LLVM Phabricator instance.

[clang][modules] Move `UNHASHED_CONTROL_BLOCK` up in the AST file
ClosedPublic

Authored by jansvoboda11 on Aug 22 2023, 6:43 PM.

Details

Summary

When loading (transitively) imported AST file, ModuleManager::addModule() first checks it has the expected signature via readASTFileSignature(). The signature is part of UNHASHED_CONTROL_BLOCK, which is placed at the end of the AST file. This means that just to verify signature of an AST file, we need to skip over all top-level blocks, paging in the whole AST file from disk. This is pretty slow.

This patch moves UNHASHED_CONTROL_BLOCK to the start of the AST file, so that it can be read more efficiently. To achieve this, we use dummy signature when first emitting the unhashed control block, and then backpatch the real signature at the end of the serialization process.

This speeds up dependency scanning by over 9% and significantly reduces run-to-run variability of my benchmarks.

Depends on D158572.

Diff Detail

Event Timeline

jansvoboda11 created this revision.Aug 22 2023, 6:43 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 22 2023, 6:43 PM
jansvoboda11 requested review of this revision.Aug 22 2023, 6:43 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 22 2023, 6:43 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
benlangmuir added inline comments.Aug 23 2023, 10:57 AM
clang/lib/Serialization/ASTReader.cpp
2685

uint8_t has no endianness and has alignment 1 anyway; you can just load it directly or use the same implementation as ASTFileSignature::create

clang/lib/Serialization/ASTWriter.cpp
1143–1172

I wonder if we should hoist these dummy values to the ASTFileSignature type and then assert that they are never read from an ASTReader

1152

We don't need an endian writer when it's bytes

1161

Nit: I think this would be clearer with an early-exit now that it's in a separate function.

Good suggestions all around, thanks!

clang/lib/Serialization/ASTWriter.cpp
1169

I don't feel great about removing const from Buffer and writing into it directly, circumventing Stream. This currently works fine, because the Stream in ASTWriter is never backed by a file (and therefore never flushed). But if that ever changes, this code is problematic. Do you think this is worth spending more time on?

Stream already has the BackpatchWord() function, which makes sure the underlying file is updated as well in case we're backpatching already-flushed data.

benlangmuir added inline comments.Aug 23 2023, 12:59 PM
clang/lib/Serialization/ASTWriter.cpp
1169

Interesting; what was the reason for not using BackpatchWord from the start? IIUC our signatures should be a multiple of 4 bytes already.

Use Stream::BackpatchWord() instead of manipulating Buffer directly.

Early return.

jansvoboda11 marked 5 inline comments as done.Aug 23 2023, 2:54 PM
jansvoboda11 added inline comments.
clang/lib/Serialization/ASTWriter.cpp
1169

Not sure how I arrived at this. This is fixed in the latest revision.

benlangmuir accepted this revision.Aug 23 2023, 2:54 PM
This revision is now accepted and ready to land.Aug 23 2023, 2:54 PM
This revision was landed with ongoing or failed builds.Aug 24 2023, 9:16 AM
This revision was automatically updated to reflect the committed changes.
jansvoboda11 marked an inline comment as done.