This is an archive of the discontinued LLVM Phabricator instance.

[Tooling/DependencyScanning] Refactor dependency scanning to produce pre-lexed preprocessor directive tokens, instead of minimized sources
ClosedPublic

Authored by akyrtzi on May 12 2022, 11:03 AM.

Details

Summary

Depends on D125486

This is 3/4 of a series of patches for making the special lexing for dependency scanning a first-class feature of the Preprocessor and Lexer.
This patch replaces the "source minimization" mechanism with a mechanism that produces lexed dependency directives tokens.

These directive tokens are not yet used by the preprocessor during dependency scanning, this will come in the following patch in the series.

Diff Detail

Event Timeline

akyrtzi created this revision.May 12 2022, 11:03 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 12 2022, 11:03 AM
akyrtzi requested review of this revision.May 12 2022, 11:03 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 12 2022, 11:03 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
akyrtzi updated this revision to Diff 429580.May 15 2022, 5:14 PM

Fix issue where an empty '#' in a line was causing the immediately following preprocessor directive to be skipped.

akyrtzi updated this revision to Diff 429766.May 16 2022, 10:27 AM

Make sure to enable line comments for dependency directive lexing.

akyrtzi updated this revision to Diff 431150.May 21 2022, 10:46 AM

Remove DependencyScanningFilesystem::disableDirectivesScanning() function.
Unlike source minimization which changes the source contents size and needed to be disabled in certain situations, directive lexing keeps the same contents size and does not need to be disabled.

dexonsmith added inline comments.May 21 2022, 11:07 AM
clang/unittests/Lex/DependencyDirectivesScannerTest.cpp
175

Can you clarify why this is changing in this patch? It’s not obvious to me why this would start optimizing internal whitespace of macros.

(I also can’t remember if it’s safe/correct. Is there a way to stringify the contents of a macro? If so, this would change the result… if not, then this seems like an improvement, but maybe better for a separate patch?)

(If there’s a good reason to change it here, don’t want to hold it up, but it’s not explained in the commit message so it’s not obvious to me.)

akyrtzi marked an inline comment as done.May 21 2022, 12:06 PM
akyrtzi added inline comments.
clang/unittests/Lex/DependencyDirectivesScannerTest.cpp
175

Can you clarify why this is changing in this patch? It’s not obvious to me why this would start optimizing internal whitespace of macros.

Minimization was doing printToNewline() after the macro identifier, which just prints verbatim, including any amount of whitespace. OTOH the new way is collecting the tokens of the directive, which ignores unnecessary whitespace.

(I also can’t remember if it’s safe/correct. Is there a way to stringify the contents of a macro? If so, this would change the result…

AFAIK this is correct since the macro body is getting tokenized when the preprocessor sees a #define and the unnecessary whitespace is ignored.

if not, then this seems like an improvement, but maybe better for a separate patch?)

To clarify, minimizeSourceToDependencyDirectives() is only used for testing purposes, once we get the tokens of the directives then we use them directly during preprocessing, so there's no place for preserving the unnecessary whitespace. This change is just inherent to the difference in implementation for minimization vs tokenization.

dexonsmith added inline comments.May 21 2022, 12:13 PM
clang/unittests/Lex/DependencyDirectivesScannerTest.cpp
175

SGTM! Thanks for explaining. (I’ll let others review in detail!)

jansvoboda11 accepted this revision.May 23 2022, 5:16 AM

LGTM, nice cleanup! Left a couple of nits.

clang/lib/Lex/DependencyDirectivesScanner.cpp
148

Can you add a comment explaining the relationship between the members?

clang/unittests/Lex/DependencyDirectivesScannerTest.cpp
114

What's the reason being these changes?

This revision is now accepted and ready to land.May 23 2022, 5:16 AM
akyrtzi updated this revision to Diff 431429.May 23 2022, 11:00 AM
akyrtzi marked an inline comment as done.

Add documentation comments for a couple of fields of Scanner in DependencyDirectivesScanner.cpp

akyrtzi marked an inline comment as done.May 23 2022, 11:02 AM
akyrtzi added inline comments.
clang/lib/Lex/DependencyDirectivesScanner.cpp
148

Done!

clang/unittests/Lex/DependencyDirectivesScannerTest.cpp
114

Originally the directive 'tokens' were:

cxx_export_decl,
cxx_module_decl,
cxx_import_decl,

While all the other token kinds were representing one top-level directive, cxx_export_decl was different in that it was treated like a "modifier" for cxx_module_decl or cxx_import_decl. But this inconsistency did not affect anything since the preprocessor was reading the minimized sources and the cxx_* tokens were generally ignored.

After these changes the preprocessor reads sets of ("one top-level directive kind" + "array of tokens for the directive") and I thought it's a bit better to keep all the directive kinds consistent as representing "one top-level directive", thus I made the change for directive kinds to be:

cxx_module_decl,
cxx_import_decl,
cxx_export_module_decl,
cxx_export_import_decl,

in which case "export module" is treated as a separate top-level directive kind than "module".