This is an archive of the discontinued LLVM Phabricator instance.

[Tooling/DependencyScanning & Preprocessor] Refactor dependency scanning to record and use pre-lexed preprocessor directive tokens, instead of minimized sources
AbandonedPublic

Authored by akyrtzi on Apr 29 2022, 10:34 AM.

Details

Summary

Make the special lexing for dependency scanning a first-class feature of the Preprocessor and Lexer, instead of implementation detail of DependencyScanningFilesystem. This has the following benefits:

  • Full access to the preprocessor state during dependency scanning. E.g. a component can see what includes were taken and where they were located in the actual sources.
  • Improved performance for dependency scanning. Measurements for dependency scanning of clang sources shows reduction in wall time
    • M1Pro: reduction of about -10.6%
    • iMacPro: reduction of about -14%
  • Opportunity to use dependency scanning lexing to speed-up skipping of excluded conditional blocks during normal preprocessing (as follow-up, not part of this patch).

Since, after this change, we don't minimize sources and pass them in place of the real sources, DependencyScanningFilesystem is not technically necessary, but it has valuable performance benefits for caching file stats along with the results of scanning the sources. So the setup of using the DependencyScanningFilesystem during a dependency scan remains.

Diff Detail

Event Timeline

akyrtzi created this revision.Apr 29 2022, 10:34 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 29 2022, 10:34 AM
Herald added a subscriber: mgorny. · View Herald Transcript
akyrtzi requested review of this revision.Apr 29 2022, 10:34 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 29 2022, 10:34 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
akyrtzi updated this revision to Diff 426114.Apr 29 2022, 10:40 AM

Update documentation comments for Lexer::seek()

I finished an initial pass through the code and it looks good for the most parts. I left a couple of suggestions in-line.

Did you test this change on some larger body of code? That would make me more confident, since this is a non-trivial change.

Might be good to get other reviewers to chime in.

clang/include/clang/Lex/PreprocessorOptions.h
214

To be honest, I'm not a fan of using PreprocessorOptions to carry state between compiler invocations.

Could we implement a different mechanism for this in a prep patch an put DependencyDirectivesForFile there? FailedModules also seem as a state rather than options.

clang/lib/Lex/DependencyDirectivesSourceMinimizer.cpp
1

Can you split out the minimizer -> directives scanner rename into a separate patch? That would help with reviewing.

clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp
204–213

Nit: I think Clang still uses PascalCase for variable names.

clang/test/ClangScanDeps/macro-expansions.cpp
2

Could you provide a short description of what's the intention of this test? Probably not obvious to the uninitiated.

Could you split this into smaller patches?
Does this support C++20 modules or is it limited to clang header modules?
Is there overhead in the non dependency scanning mode?

akyrtzi added inline comments.May 2 2022, 12:16 PM
clang/include/clang/Lex/PreprocessorOptions.h
214

Would you be ok to consider this as a task for follow-up? It seems orthogonal to the intended changes of this patch (I mean the patch doesn't make things worse in that respect AFAICT)

Could you split this into smaller patches?

I'll split up the renames to a separate patch so that it is easier to see the code that affects functionality. Not sure if it can be broken further, after that the changes are interdependent.
Once I separate the renames we can see how it looks, I suspect it will be much easier to review.

Does this support C++20 modules or is it limited to clang header modules?

Dependency scanning should work the same as before for C++20 modules, there should not be a change (either making it worse or better).

Is there overhead in the non dependency scanning mode?

Good suggestion, I'll do some measurements and get back to you.

Is there overhead in the non dependency scanning mode?

Good suggestion, I'll do some measurements and get back to you.

I took measurements for preprocessing the clang sources, with a release+thin-LTO build, and differences are within the noise level.

jansvoboda11 added inline comments.May 5 2022, 3:57 AM
clang/include/clang/Lex/PreprocessorOptions.h
213

You'll need to #include "clang/Basic/FileEntry.h". Build with modules fails otherwise.

I've split this in smaller patches:

https://reviews.llvm.org/D125484 - NFC rename refactorings
https://reviews.llvm.org/D125486 - Remove ExcludedPreprocessorDirectiveSkipMapping
https://reviews.llvm.org/D125487 - Change to producing pre-lexed directive tokens instead of minimized source text
https://reviews.llvm.org/D125488 - Change Preprocessor to use the pre-lexed directive tokens during dependency scanning.

akyrtzi abandoned this revision.May 23 2022, 11:35 AM

This has been superseded by the above set of patches.

clang/include/clang/Lex/Preprocessor.h