This is an archive of the discontinued LLVM Phabricator instance.

[Syntax] Simplify TokenCollector::Builder, use captured expansion bounds. NFC
ClosedPublic

Authored by sammccall on Apr 6 2020, 5:46 PM.

Details

Summary

The motivation here is fixing https://bugs.llvm.org/show_bug.cgi?id=45428, see
D77507. The fundamental problem is that a "top-level" expansion wasn't precisely
defined. Repairing this concept means that TokenBuffer's "top-level expansion"
may not correspond to a single macro expansion. Example:

#define ID(X) X
#define M 1+ID
M(2); // expands to 1+2

The expansions overlap, but neither expansion alone yields all the tokens.
We need a TokenBuffer::Mapping that corresponds to their union.

This is fairly easy to fix in CollectPPExpansions, but the current design of
TokenCollector::Builder needs a fix too as it relies on the macro's expansion
range rather than the captured expansion bounds. This fix is hard to make due
to the way code is reused within Builder. And honestly, I found that code pretty
hard to reason about too.

The new approach doesn't use the expansion range, but only the expansion
location: it assumes an expansion is the contiguous set of expanded tokens with
the same expansion location, which seems like a reasonable formalization of
the "top-level" notion.

And hopefully the control flow is easier to follow too, it's considerably
shorter even with more documentation.

Diff Detail

Event Timeline

sammccall created this revision.Apr 6 2020, 5:46 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2020, 5:46 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
sammccall updated this revision to Diff 255551.Apr 6 2020, 5:49 PM

drop accidental change

kadircet added inline comments.Apr 7 2020, 2:13 AM
clang/lib/Tooling/Syntax/Tokens.cpp
612

s/Next/NextSpelled

in following lines as well

636

nit: maybe create mapping here and increment NextSpelled and NextExpanded explicitly here.

642

maybe also explicitly spell the assumption:
assert(ExpandedTok.location().isFileID()) ?

661–662

i suppose and each file part refers to NextSpelled but it seems to be confusing maybe have a separate one below ?

kadircet accepted this revision.Apr 7 2020, 2:14 AM

LGTM, thanks!

This revision is now accepted and ready to land.Apr 7 2020, 2:14 AM
sammccall updated this revision to Diff 255618.Apr 7 2020, 2:26 AM

Crash with a nice message if our loop gets stuck.

Looks like

ok Token(`void`, void, length = 4)
ok Token(`test`, identifier, length = 4)
ok Token(`(`, l_paren, length = 1)
ok Token(`int`, int, length = 3)
ok Token(`*`, star, length = 1)
ok Token(`List`, identifier, length = 4)
ok Token(`)`, r_paren, length = 1)
ok Token(`{`, l_brace, length = 1)
ok Token(`0`, numeric_constant, length = 1)
!! Token(``, eof, length = 0)
   Token(`}`, r_brace, length = 1)
   Token(``, eof, length = 0)
sammccall marked 4 inline comments as done.Apr 7 2020, 2:46 AM
sammccall added inline comments.
clang/lib/Tooling/Syntax/Tokens.cpp
636

Not quite sure what you mean here, my guesses...

Hoisting mapping creation to here: If we have file tokens, we're not creating a mapping. Only nontrivial mappings are stored (where the spelled and expanded locations differ). This may be a design mistake but it's not one I'm fixing in this patch :-)

Incrementing NextSpelled and NextExpanded eagerly: if our invariants hold (expanded and spelled tokens really do correspond) then we will indeed increment each of these at least once, so we could structure the code that way. However those invariants are ridiculously subtle and fragile (basically depends on the correctness of the TokenWatcher impl in Preprocessor) so in practice it's good not to advance if our assumptions aren't met so we can actually debug the result. The latest version of the patch makes use of this to detect and crash with a useful message (diagnoseAdvanceFailure).

642

This is very weak compared to the explicitly checked equality in the while loop, which I think spells out the assumption better (these are consecutive spelled tokens).

That assertion would effectively be checking that SpelledTokens doesn't contain any tokens with macro locations, which is trivially true by construction and not really sensible to check in all the places we rely on it.

sammccall updated this revision to Diff 255621.Apr 7 2020, 2:46 AM

address comments

still LG

clang/lib/Tooling/Syntax/Tokens.cpp
526

nit: formatting

636

i was talking about the latter (eagerly incrementing).

With the latest diagnosing mechanism, i also agree that it should stay that way.

641

nit: format

This revision was automatically updated to reflect the committed changes.
sammccall marked 2 inline comments as done.