This is an archive of the discontinued LLVM Phabricator instance.

[pseudo] Add crude heuristics to choose taken preprocessor branches.
ClosedPublic

Authored by sammccall on Mar 7 2022, 2:35 PM.

Details

Summary

In files where different preprocessing paths are possible, our goal is to
choose a preprocessed token sequence which we can parse that pins down as much
of the grammatical structure as possible.
This forms the "primary parse", and the not-taken branches get parsed later,
and are constrained to be compatible with the primary parse.

Concretely:

int x =
  #ifdef // TAKEN
    2 + 2 + 2 // determined during primary parse to be an expression
  #else
    2 // constrained to be an expression during a secondary parse
  #endif
  ;

Diff Detail

Event Timeline

sammccall created this revision.Mar 7 2022, 2:35 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2022, 2:35 PM
sammccall requested review of this revision.Mar 7 2022, 2:35 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2022, 2:35 PM
hokein added inline comments.Mar 10 2022, 7:07 AM
clang/include/clang/Tooling/Syntax/Pseudo/DirectiveMap.h
123

off-topic: a random idea on the name of DirectiveMap came up to my mind when reading the code, how about DirectiveTree, the comment of it says the structure is a tree, and the BranchChooser is actually a tree-visiting pattern.

clang/include/clang/Tooling/Syntax/Pseudo/Token.h
84

nit: the name doesn't match the LLVM code-style.

clang/lib/Tooling/Syntax/Pseudo/DirectiveMap.cpp
210

We evaluate each conditional directive *independently*, I wonder whether it is important to use any low-hanging tricks to ensure consistent choices are made among different conditional directives. (my gut feeling is that it is nice to have, but we should not worry too much about it at the moment).

For example,

#ifdef __cplusplus
extern "C" {
#endif
....
#ifdef __cplusplus
}
#endif

If we enable the first #if, and the second one should be enabled as well. (this example is a trivial and common pattern, it has been handled by the existing code).

218

nit: add a comment saying higher is better.

225

Maybe (-Errors, Tokens, Directives) > (-Other.Errors, ...)

295

nit: !C.Taken => !C.Token.hasValue(), which is clearer.

305

nit: instead of using trivially, I think confidently fits better what the method does.

sammccall marked 6 inline comments as done.Mar 30 2022, 8:00 AM
sammccall added inline comments.
clang/lib/Tooling/Syntax/Pseudo/DirectiveMap.cpp
210

Added a comment describing this case, I don't think we should tackle this now.

305

You're right this is a bad name, but also because returning false doesn't mean it's not trivially/confidently taken, but rather that it's trivially/confidently *not* taken.
I think removing the adverb actually makes this clearer.

Also this case is confusing

#if foo
#else // is this "always taken"?
#bar

Renamed to isTakenWhenReached, wdyt?

sammccall updated this revision to Diff 419142.Mar 30 2022, 8:01 AM
sammccall marked 2 inline comments as done.

Rebase and address comments

Herald added a project: Restricted Project. · View Herald TranscriptMar 30 2022, 8:01 AM
hokein accepted this revision.Apr 1 2022, 1:39 AM

Thanks, looks good!

clang-tools-extra/pseudo/include/clang-pseudo/DirectiveMap.h
95 ↗(On Diff #419142)

it is done in this patch, can be removed now.

clang-tools-extra/pseudo/lib/DirectiveMap.cpp
329 ↗(On Diff #419142)

I thought the first token of the Directive should always be tok::hash, isn't it?

This revision is now accepted and ready to land.Apr 1 2022, 1:39 AM
sammccall marked an inline comment as done.Apr 6 2022, 8:22 AM
sammccall added inline comments.
clang-tools-extra/pseudo/lib/DirectiveMap.cpp
329 ↗(On Diff #419142)

Yeah, this was defensive against parsing recovery, but we're sufficiently close to the parser that we don't need to be.

This revision was automatically updated to reflect the committed changes.
sammccall marked an inline comment as done.