This is an archive of the discontinued LLVM Phabricator instance.

Fix crash on `user defined literals`
ClosedPublic

Authored by eduucaldas on Jun 19 2020, 1:14 AM.

Details

Summary

Given an UserDefinedLiteral 1.2_w:
Problem: Lexer generates one Token for the literal, but ClangAST
references two source locations
Fix: Ignore the operator and interpret it as the underlying literal.
e.g.: 1.2_w token generates syntax node IntegerLiteral(1.2_w)

Diff Detail

Event Timeline

eduucaldas created this revision.Jun 19 2020, 1:14 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 19 2020, 1:14 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Why does this work?

gribozavr2 added inline comments.Jun 29 2020, 12:43 AM
clang/lib/Tooling/Syntax/BuildTree.cpp
632

This comment makes sense, but it is not clear how it is connected to the implementation. Could you clarify? Also, could you re-wrap the comment to 80 columns? Where exactly in the AST do we have the source location that points to _w? What are we doing to avoid using that source location?

eduucaldas updated this revision to Diff 276361.Jul 8 2020, 3:22 AM

Reflect fix on RecursiveASTVisitor

eduucaldas marked an inline comment as done.Jul 8 2020, 3:22 AM

Fix crash on user defined literals

WDYT:

Implement support for user defined literals (which also fixes a crash)

Given an UserDefinedLiteral 1.2_w:
Problem: Lexer generates one Token for the literal, but ClangAST
references two source locations
Fix: Ignore the operator and interpret it as the underlying literal.
e.g.: 1.2_w token generates syntax node IntegerLiteral(1.2_w)

WDYT:

A user defined literal (for example, 1.2_w) is one token. The Clang AST for a user defined literal references two source locations: the beginning of the token (the location of 1 in 1.2_w) and the beginning of the suffix (the location of _). When constructing the syntax tree, we were trying to find a token that starts at the underscore, but couldn't find one, and crashed on an assertion failure. To fix this issue, we ignore the Clang AST nodes for UDLs that have source locations that point in the middle of a token.

clang/lib/Tooling/Syntax/BuildTree.cpp
630

"The semantic AST node for has child nodes that reference two source locations, the location of the beginning of the token (1), and the location of the beginning of the UDL suffix (_). The UDL suffix location does not point to the beginning of a token, so we can't represent the UDL suffix as a separate syntax tree node."

clang/unittests/Tooling/Syntax/TreeTest.cpp
1241–1243

Indent -2.

1244

Could you also add tests for user-defined string literals and user-defined character literals? ("abc"_mystr, u"abc"_mystr, 'c'_mychar)

1309

It looks somewhat weird to me that integer and floating point literals end up with the same syntax tree node type. WDYT about making different nodes for different literals (integer, floating-point, string, character)?

eduucaldas marked 2 inline comments as done.Jul 8 2020, 7:33 AM
eduucaldas added inline comments.
clang/unittests/Tooling/Syntax/TreeTest.cpp
1309

Makes sense. Let's follow the grammar then.

eduucaldas marked 4 inline comments as done.

Add support for {Integer,Float,Char,String}UserDefinedLiteral

workaround size_t

eduucaldas marked 4 inline comments as done.Jul 9 2020, 12:14 AM
eduucaldas added inline comments.
clang/include/clang/Tooling/Syntax/Nodes.h
347–357 ↗(On Diff #276638)

This is gonna be removed once we are able to distinguish between integer and float on raw and template UDL

clang/lib/Tooling/Syntax/BuildTree.cpp
138–139

I know it adds noise, but this is the only place where the formatting doesn't obey the llvm clang format.

647

Here we need logic to determine which kind of UDL, if float or integer

clang/unittests/Tooling/Syntax/TreeTest.cpp
1355–1356

For raw and template UDL, we don't know yet how to get the information about integer of float

eduucaldas updated this revision to Diff 276678.Jul 9 2020, 2:41 AM

Polishing patch

clang/lib/Tooling/Syntax/BuildTree.cpp
630

I changed your comment slightly, WDYT

gribozavr2 added inline comments.Jul 9 2020, 2:58 AM
clang/lib/Tooling/Syntax/BuildTree.cpp
630

LGTM!

eduucaldas updated this revision to Diff 276681.Jul 9 2020, 2:59 AM

Document proposed solution for treating raw literal operators and numeric template operators

Add support for integer and floating UDL even for raw literal operator and
numeric literal operator template

eduucaldas marked an inline comment as done.Jul 9 2020, 10:24 AM
eduucaldas added inline comments.
clang/lib/Tooling/Syntax/BuildTree.cpp
627

Not sure if text is the right thing to get a TokenSpelling from a syntax::Token

eduucaldas marked 2 inline comments as done and an inline comment as not done.Jul 9 2020, 10:26 AM
gribozavr2 added inline comments.Jul 9 2020, 10:54 AM
clang/lib/Tooling/Syntax/BuildTree.cpp
650

Please use parentheses to call the constructor. Braces are correct, but not idiomatic in LLVM.

667

Please allocate an instance of the correct derived type. It is undefined behavior to downcast an object allocated as UserDefinedLiteralExpression to any of its subtypes.

clang/unittests/Tooling/Syntax/TreeTest.cpp
1325

No need to explain the language.

1328

No need to explain the language.

1334

call -> calls? (as in, "this expression calls ...")

1380

I don't understand why this test is separate from the previous one -- why not merge them all into one, or split all of them into one call per test?

1383

ditto, "calls"

Also please add a space before "//".

eduucaldas marked 9 inline comments as done.

Use right function to get TokSpelling, answer comments

clang/unittests/Tooling/Syntax/TreeTest.cpp
1334

it is a noun here, as kind is.

eduucaldas marked 2 inline comments as done.Jul 9 2020, 11:34 PM
eduucaldas added inline comments.
clang/lib/Tooling/Syntax/BuildTree.cpp
636

b or B? Should I use camelCase or TitleCase?

648–664

I'll write a comment explaining why this crazyness. Promise not to rant about the lexer that doesn't distinguish between float and integer.

651–653

This is exactly the function used by the parser to get the TokSpelling. see: llvm-project/clang/lib/Sema/SemaExpr.cpp:3633

  • Add comment explaining complication on LOK_Raw and LOK_Template
  • Use FileRange::text instead of Lexer::getSpelling
gribozavr2 accepted this revision.Jul 10 2020, 8:05 AM
gribozavr2 added inline comments.
clang/lib/Tooling/Syntax/BuildTree.cpp
662

... else { assert(Literal.isFloatingLiteral()); return new ... }

This revision is now accepted and ready to land.Jul 10 2020, 8:05 AM
eduucaldas marked an inline comment as done.

Add assert

This revision was automatically updated to reflect the committed changes.