This is an archive of the discontinued LLVM Phabricator instance.

[Preprocessor] Fix assertion in AnnotatePreviousCachedTokens
AbandonedPublic

Authored by bruno on Dec 2 2015, 7:29 PM.

Details

Summary

Consider the following ObjC++ snippet:

@protocol PA;
@protocol PB;

@class NSArray<ObjectType>;
typedef int some_t;

id<PA> FA(NSArray<id<PB>> *h, some_t group);

This would hit an assertion in the parser after generating an annotation token while
trying to update the token cache:

Assertion failed: (CachedTokens[CachedLexPos-1].getLastLoc() == Tok.getAnnotationEndLoc() && "The annotation should be until the most recent cached token")
...
7 clang::Preprocessor::AnnotatePreviousCachedTokens(clang::Token const&) + 494
8 clang::Parser::TryAnnotateTypeOrScopeTokenAfterScopeSpec(bool, bool, clang::CXXScopeSpec&, bool) + 1163
9 clang::Parser::TryAnnotateTypeOrScopeToken(bool, bool) + 361
10 clang::Parser::isCXXDeclarationSpecifier(clang::Parser::TPResult, bool*) + 598
...

The cached preprocessor token in this case is:

greatergreater '>>' Loc=<testcase.mm:7:24>

while the annotation ("NSArray<id<PB>>") ends at "testcase.mm:7:25", hence the assertion.
The mismatch only happens because of the cached token length and the assertion should account for that.

Diff Detail

Event Timeline

bruno updated this revision to Diff 41709.Dec 2 2015, 7:29 PM
bruno retitled this revision from to [Preprocessor] Fix assertion in AnnotatePreviousCachedTokens.
bruno updated this object.
bruno added reviewers: doug.gregor, akyrtzi.
bruno added subscribers: cfe-commits, dexonsmith.
akyrtzi edited edge metadata.Dec 7 2015, 12:30 PM
akyrtzi added a subscriber: akyrtzi.

Hi Bruno,

bruno added a comment.Dec 7 2015, 1:29 PM

Hi Argyrios,

Thanks for the suggestions, will apply them.
The assertion seems important to catch subtle bugs, are you sure it should be placed inside a "#ifndef NDEBUG”?

bruno updated this revision to Diff 42649.Dec 12 2015, 5:25 PM
bruno edited edge metadata.

Thanks Duncan and Argyrios, updated a patch with the suggestions!

rsmith requested changes to this revision.Dec 17 2015, 11:43 AM
rsmith added a reviewer: rsmith.
rsmith added a subscriber: rsmith.

I think that this will leave us with a broken token stream. In your example, the cached token stream starts as

NSArray < id < PB >> * [...]

... and we try to annotate the id<PB> with our CachedLexPos pointing at the * token. That leaves CachedTokens containing:

NSArray < (type annotation) * [...]

... which is wrong. We need to actually convert the tok::greatergreater in CachedTokens into a tok::greater and update its location and length in order for the cached token stream to be correctly updated. Otherwise if the parser backtracks it will see the wrong token stream.

lib/Lex/PPCaching.cpp
103–105

Please add braces inside this #ifndef block so these variables don't accidentally leak into the rest of the function.

109

that cases -> that case

110

Please only do this extra work if CachedLastTokLoc != TokAnnEndLoc.

This revision now requires changes to proceed.Dec 17 2015, 11:43 AM
bruno updated this revision to Diff 44180.Jan 6 2016, 4:22 PM
bruno edited edge metadata.

Hi Richard,

Thanks for the comments. Updated the patch!

I think that this will leave us with a broken token stream. In your example, the cached token stream starts as

NSArray < id < PB >> * [...]

... and we try to annotate the id<PB> with our CachedLexPos pointing at the * token. That leaves CachedTokens containing:

NSArray < (type annotation) * [...]

... which is wrong. We need to actually convert the tok::greatergreater in CachedTokens into a tok::greater and update its location and length in order for the cached token stream to be correctly updated. Otherwise if the parser backtracks it will see the wrong token stream.

I don't think this happens, the type annotation starts at 7:11 and ends at 7:24:
identifier 'NSArray' Loc=<testcase.mm:7:11>
greatergreater '>>' Loc=<testcase.mm:7:24>

The code that follows the assert then patches the CachedTokens the correct way, see the CachedTokens before and after:

  • Before:

(clang::Preprocessor::CachedTokensTy) $32 = {

[0] = (Loc = 89, UintData = 1, PtrData = 0x0000000000000000, Kind = l_paren, Flags = 0)
[1] = (Loc = 90, UintData = 7, PtrData = 0x000000010d82fba0, Kind = identifier, Flags = 0)
[2] = (Loc = 97, UintData = 1, PtrData = 0x0000000000000000, Kind = less, Flags = 0)
[3] = (Loc = 98, UintData = 2, PtrData = 0x000000010d01da58, Kind = identifier, Flags = 0)
[4] = (Loc = 100, UintData = 1, PtrData = 0x0000000000000000, Kind = less, Flags = 0)
[5] = (Loc = 101, UintData = 2, PtrData = 0x000000010d82fb70, Kind = identifier, Flags = 0)
[6] = (Loc = 103, UintData = 2, PtrData = 0x0000000000000000, Kind = greatergreater, Flags = 0)
[7] = (Loc = 106, UintData = 1, PtrData = 0x0000000000000000, Kind = star, Flags = 2)

}

  • After:

(clang::Preprocessor::CachedTokensTy) $34 = {

[0] = (Loc = 89, UintData = 1, PtrData = 0x0000000000000000, Kind = l_paren, Flags = 0)
[1] = (Loc = 90, UintData = 104, PtrData = 0x000000010d820660, Kind = annot_typename, Flags = 0)
[2] = (Loc = 106, UintData = 1, PtrData = 0x0000000000000000, Kind = star, Flags = 2)

}

bruno updated this revision to Diff 44306.Jan 7 2016, 7:13 PM
bruno edited edge metadata.

Hi Richard,

Thanks for the detailed explanation, it now makes sense to me. I updated the patch with another approach! Let me know if it's the right direction.

bruno added a subscriber: bruno.Jan 15 2016, 8:29 AM

Ping :-)

bruno updated this revision to Diff 45421.Jan 20 2016, 11:22 AM

Update patch after Richard comments

bruno updated this revision to Diff 46332.Jan 28 2016, 7:01 PM

Update the patch to use ArrayRef and remove a misleading assertion.
Richard, any more comments regarding this approach?

Thanks,

doug.gregor accepted this revision.Jan 28 2016, 7:58 PM
doug.gregor edited edge metadata.

This approach looks great to me.

bruno accepted this revision.Jan 30 2016, 4:55 PM
bruno added a reviewer: bruno.

Thanks,

Committed r259311!

bruno accepted this revision.May 31 2016, 12:01 PM
bruno abandoned this revision.